Copyright (C) 2022-2026 James S. Seymour (jseymour@LinxNet.com) See Copyright.txt for license terms. atre/attack_parser_re project HowTo docs attack_parser_re is a regular-expression based log parser that can be used either as a stand-alone utility (atre-parser) or integrated directly into the sshguard source tree as an additional attack detection engine. This document describes: - Building the stand-alone parser - Integrating the parser directly into sshguard - Creating parsing regular expressions This was all written off the top of my head. It very well may contain errors and omissions. You have been warned! It's as rough as it is in part because it's become something it was not set out to be. It was originally intended to be added to the sshguard project. -------------------------------------------------------------------- Notes Regarding SSHGuard 2.x Integration The integration procedures described here were developed and tested against sshguard 1.7.x. The sshguard 2.x codebase has diverged somewhat, but the same general integration approach should still work with minor adjustments. Integration with sshguard 2.x has not yet been tested by the author. That said: atre-parser has been designed to work as a drop-in replacement for SSHGuard 2.x's default parser. It hasn't been tested in that role, but its output *appears to* Do The Right Thing. Feedback or contributions in this area would be welcome. -------------------------------------------------------------------- Building The Stand-Alone Parser This one's pretty straight-forward Unpack the tarball into the directory in which you wish to build. The INCLUDES directive in Makefile may require an edit The sshg_1.7.0_includes directory contains copies of the headers from sshguard-1.7.x. They may or may not be the same as those in sshguard-2.4.x. It's unimportant, as they're only used internally by the stand-alone atre-parser build. You may wish to change the ATRE_CONFIG_FILE_PATH base directory in attack_parser_re.h POSIX regex make Using PCRE make USE_PCRE=1 You'll need libpcre, along with the "-dev" packages. - or - make USE_PCRE2=1 You'll need libpcre2, along with the "-dev" packages. PCRE produces significantly better performance than the POSIX regex implementation. It also allows for more rigorous checking of expressions, and better detail of expression errors. extras/attack_parser_re-posix.out (output of tests) was generated with: ./atre-parser -d4 -v4 -c test/attack_parser_re.conf extras/attack_parser_re-posix.out 2>&1 attack_parser_re.pcre - can use for PCRE versions Regression tests similar to sshguard-2.4.2 style regression tests ./atre-parser -b -r -c examples/attack_parser_re.conf . Run "make" with the options you've chosen and cross your fingers. Testing You can test the atre engine with the atre-parser command-line utility, as follows: atre-parser -r -c examples/attack_parser_re.conf tmp/tempfile 2>&1 diff extras/attack_parser_re-posix.out tmp/tempfile |less If built with PCRE support: atre-parser -d 10 -v 10 -c test/attack_parser_re.pcre tmp/tempfile 2>&1 diff extras/attack_parser_re-pcre.out tmp/tempfile |less Note: There will probably be at least one diff if built with PCRE2: < regcomp(): pcre_study(): no study data How to test running regex config file reloading: tail -f /var/log/syslog |./atre-parser -v 2 -d 2 -c From anothe terminal: Find the PID of atre-parser and kill -USR1 You'll see log_info: attack_parser_re signatures reload requested attack_parser_re_init(): atre config file path: "" log_info: reload_attack_parser_re_conf(): Attack parser signatures already up-to-date Then touch kill -USR1 You'll see log_info: attack_parser_re signatures reload requested attack_parser_re_init(): atre config file path: "" log_info: attack_parser_re_init(): Reloading attack_parser_re signatures attack_parser_re_init(): NN attack signatures loaded N.B.: atre-parser is not built by the sshguard build. You have go into the attack_parser_re directory and build it separately. The Makefile there uses the same flags as above, with the addition of ATRE_CONFIG_FILE_DIR= which you'd normally set to sshguard's config file path, in which case, as above, a config file extension of either ".conf" or ".pcre" will be expected, depending upon whether or not PCRE support was specified. Creating Parsing Regular Expressions Not going to get into Regular Expression syntax, here. There's plenty of documentation on that elsewhere. This is going to be a discussion on how they're specified and used with atre-parser and the attack_parser_re() function. The syntax is: /regular expression with two captures/ See either the attack_parser_re.conf (POSIX) or attack_parser_re.pcre (PCRE) files in the .../examples/ directory. The delimiting "/"s are required. Do not escape any intermediate "/"s. The internal regexps that parse the config regexps ignore them. Exactly two capture groups must exist: 1. service identifier 2. IP address in that order. is an integer value enclosed in literal angle-brackets. A value of <0> means use the default dangerousness value (defined in attack_parser_re.h). parse_line_re() will attempt to use the first captured sub-string to determine the service code for the captured service identifier. You can acquire a list of known service identifiers and their codes with atre-parser -h services If service ident capture cannot be matched with a known service identifier, the "catch-all" service code of 0 ("all") will be used. Thus: If you wish to capture a log entry that doesn't have a known service identifier, capture whatever sub-string makes sense to you. Note: As of this writing I *still* haven't figured out how service codes are used by sshguard. Notes: When the "all" service code results, "reverse lookups" (service code to service name) will return "all", rather than what was captured. "sshd" log line captures should be written / (ssh)d.../ <...> not / (sshd).../ <...> PCRE builds allow an additional optimization over POSIX: Anchored expressions of the form: /^.{23} ... cause an offset to be applied to the compiled regex, thus avoiding them having to iterate over X number of leading characters. Thus, for example, you can skip right past leading "date time hostname" bits, leading to a slight performance improvement. The regexp config reader has an IP address macro facility. Where it is desired to parse an IP address you can use any one of the following macros: These macros are expanded into a suitable regular expressions before RE compilation. The IPv4 macro expansion is rigorous, the others not so much. Particularly for IPv6 expressions. Their doing the right thing will depend upon the rigorousness of the rest of the regex you specify.