Copyright (C) 2022-2026 James S. Seymour (jseymour@LinxNet.com)
See Copyright.txt for license terms.

atre/attack_parser_re project HowTo docs

attack_parser_re is a regular-expression based log parser that can be
used either as a stand-alone utility (atre-parser) or integrated directly
into the sshguard source tree as an additional attack detection engine.

This document describes:

    - Building the stand-alone parser
    - Integrating the parser directly into sshguard
    - Creating parsing regular expressions

This was all written off the top of my head.  It very well may contain
errors and omissions.  You have been warned!

It's as rough as it is in part because it's become something it was not set
out to be.  It was originally intended to be added to the sshguard project.


    --------------------------------------------------------------------
                 Notes Regarding SSHGuard 2.x Integration

    The integration procedures described here were developed and tested
    against sshguard 1.7.x.

    The sshguard 2.x codebase has diverged somewhat, but the same general
    integration approach should still work with minor adjustments.

    Integration with sshguard 2.x has not yet been tested by the author.

    That said: atre-parser has been designed to work as a drop-in
    replacement for SSHGuard 2.x's default parser. It hasn't been tested
    in that role, but its output *appears to* Do The Right Thing.

    Feedback or contributions in this area would be welcome.
    --------------------------------------------------------------------


    Building The Stand-Alone Parser

	This one's pretty straight-forward

	Unpack the tarball into the directory in which you wish to build.

	The INCLUDES directive in Makefile may require an edit

	    The sshg_1.7.0_includes directory contains copies of the headers
	    from sshguard-1.7.x.  They may or may not be the same as those
	    in sshguard-2.4.x.  It's unimportant, as they're only used
	    internally by the stand-alone atre-parser build.

	You may wish to change the ATRE_CONFIG_FILE_PATH base directory in
	attack_parser_re.h

	POSIX regex

	    make

	Using PCRE

	    make USE_PCRE=1

	    You'll need libpcre, along with the "-dev" packages.

	    - or -

	    make USE_PCRE2=1

	    You'll need libpcre2, along with the "-dev" packages.

	PCRE produces significantly better performance than the POSIX regex
	implementation. It also allows for more rigorous checking of expressions,
	and better detail of expression errors.

	extras/attack_parser_re-posix.out (output of tests) was generated with:

	    ./atre-parser -d4 -v4 -c test/attack_parser_re.conf <test/testfile
		>extras/attack_parser_re-posix.out 2>&1

		attack_parser_re.pcre - can use for PCRE versions

	Regression tests similar to sshguard-2.4.2 style regression tests

	    ./atre-parser -b -r -c examples/attack_parser_re.conf <test/mytests.txt

		    attack_parser_re.pcre - can use for PCRE versions

	See the docs for sshguard-2.4.2 and atre-parser_doc.txt for instructions
	on how to replace sshguard's default parser with atre-parser.


    Building Directly Into SSHGuard

	attack_parser_re() function call as an addition to, or replacement for,
	sshguard's native parser.  (I've been running it as an addition to
	sshguard's parser for years.)

	This is really only for the fairly code-savvy.

	extras/sshguard-1.7.x integration_diffs text files will give you hints
	on how do do this

	The Notes file included with the distribution tarball has running
	notes I made to myself when I first did this for my own sshguard build.
	They should be more-or-less accurate... -ish.

	Using Patch Files

	    As of release 0.1.0, unified diff files, suitable for use with the
	    "patch" utility, can be used for sshguard-1.7.x.

	    Process:

		. Extract the sshguard-1.7.x tarball
		. Run configure as you normally would
		. Copy attack_parser_re.h and attack_parser_re.c into the
		  src/parser directory
		. From the base sshguard-1.7.x dir, run

		  patch -p0 <.../sshguard-1.7.x_integration.patch

		  where "x" is the appropriat minor version number.

		. Skip to "Build (make) Options", below, to build

	    Notes:

		  The 1.7.0 diff and integration patch files are for PCRE
		  only. They do not include the diffs for PCRE2.  If you
		  wish to use PCRE2 with 1.7.0, minor adjustments will have
		  to be made after applying the patch file.

		  The 1.7.1 diff and patch files included PCRE2 build
		  support.

	          These patches include a fix for the annoying "has already
	          been blocked" log messages.

	Manual Method

	    . Copy attack_parser_re.c and attack_parser_re.h into sshguard's
	      source tree.	(Probably in a .../parser directory.)

	    . In whatever file contains main() for the sshguard executable,
	      add this under the other include directives:

		  #include "parser/attack_parser_re.h"

	      (Assuming, of course, that's where you put the attack_parser_re
	       files.)

	    . Find where all the other "init" functions are done and add a
	      call to the attack_parser_re_init() function.  Something like...

		  procauth_init();
		  whitelist_init();
		  attack_parser_re_init(NULL, NULL, sshg_debugging, 2, 1);

		  Arguments:

		    config_path        NULL = use default
		    return_list        NULL if not needed
		    debugging          sshguard debug level
		    logging_verbosity  recommended: 2
		    use_syslog         1 when running inside sshguard

	    . Find the call to the existing parser routine in the sshguard
	      executable.  It'll look something like:

		if(parse_line(source_id, buf, &parsed_attack) != 0) {

	      and change it to read something like:

		if ((parse_line(source_id, buf, &parsed_attack) != 0) &&
		    (parse_line_re(buf, &parsed_attack) != 0)) {

	      This causes sshguard to try its native parser first, and if that
	      fails, fall back to attack_parser_re().

	    . Find the various _fin() functions that are executed on program
	      termination and add one for attack_parser_re.  (Not absolutely
	      necessary, but it's good form.)

		  procauth_fin();
		  attack_parser_re_fin(NULL);
		  sshguard_log_fin();

	    . If you want to be able to update regexp signatures on-the-fly,
	      to a running instance of sshguard, it gets a little trickier.

		. Up near the top of sshguard's file that contains main(), add:

		    /*
		     * Signal-tracking for thread-safe signal handler
		     */
		    static volatile sig_atomic_t ts_got_signal = 0;
		    static void ts_sig_handler(int signo);

		. Just under where main() starts, where the other function
		  variables are declared:

		      struct sigaction sa;        /* for thread-safe signal handler */

		. Somewhere near the beginning of main(), where the other
		  signal-catching is set up, add:

		    memset(&sa, 0, sizeof(struct sigaction));
		    sigemptyset(&sa.sa_mask);
		    sa.sa_handler = &ts_sig_handler;
		    if (sigaction(SIGUSR1, &sa, NULL) == -1) {
			perror("sigaction"); return EXIT_FAILURE;
		    }

		    I chose SIGUSR1.  You can change this if you like.  (N.B.: SIGHUP,
		    SIGINT, and SIGTERM are [probably] already in-use by sshguard to
		    terminate sshguard, so not those.)

		. Just under where the line-parsing calls are (as above), add:

		    if (ts_got_signal) {
			ts_got_signal = 0;
			sshguard_log(LOG_DEBUG, "Received signal to reload attack_parser_re signatures");
			reload_attack_parser_re_conf();
		    }

		    This causes the parsing expressions to be reloaded the next time a
		    log line is processed.

		. Add the signal-catching function somewhere handy (outside of main(), obviously):

		    /*
		     * Thread-safe signal handler
		     */
		    static void ts_sig_handler(int signo) {
			ts_got_signal = signo;
		    }

		    All it does is set a flag to indicate a re-read is requested.

		. Integration into sshg-parser is similar, except, of course,
		  there's no point to the signal-catching.

		. Then you have to make a bunch of changes to the Makefile in order
		  to build the result.  I can't even begin to explain all those.  Refer
		  to the Notes doc and extras/sshguard-1.7.x_integration_diffs.txt
		  for hints on how do do this.

		  Note: I haven't yet modified the make file(s) in the sshguard
		  distribution to try to build atre-parser, as well.  Easier to
		  just build it separately. (See below.)

	    . Build (make) Options

		For integrating with sshguard-1.7.x

		    SSHG_1_7_0=1

		otherwise assumes 2.4.x/2.5.x

		For PCRE regular expression support:

		    USE_PCRE=1

		    - or -

		    USE_PCRE=1

		otherwise is built with POSIX regex support.

		The config file directory will default to sshguard's, and a
		config file name of either

		    attack_parser_re.conf

		or

		   attack_parser_re.pcre

		depending upon whether PCRE support was specified or not.

		The config file path can be completely overridden with

		    ATRE_CONFIG_FILE_PATH=<path of your choosing>

	    . Run "make" with the options you've chosen and cross your fingers.


    Testing

	You can test the atre engine with the atre-parser command-line utility, as
	follows:

	    atre-parser -r -c examples/attack_parser_re.conf <test/mytests.txt

	    If built with POSIX RE support:

		atre-parser -d 10 -v 10 -c test/attack_parser_re.conf <test/testfile \
		  >tmp/tempfile 2>&1
		diff extras/attack_parser_re-posix.out tmp/tempfile |less

	    If built with PCRE support:

		atre-parser -d 10 -v 10 -c test/attack_parser_re.pcre <test/testfile \
		  >tmp/tempfile 2>&1
		diff extras/attack_parser_re-pcre.out tmp/tempfile |less

		Note: There will probably be at least one diff if built with PCRE2:

		    < regcomp(): pcre_study(): no study data

	    How to test running regex config file reloading:

		tail -f /var/log/syslog |./atre-parser -v 2 -d 2 -c <config file>

		From anothe terminal: Find the PID of atre-parser and

		    kill -USR1 <PID>

		You'll see 

		    log_info: attack_parser_re signatures reload requested
		    attack_parser_re_init(): atre config file path: "<config file>"
		    log_info: reload_attack_parser_re_conf(): Attack parser signatures already up-to-date

		Then

		    touch <config file>
		    kill -USR1 <PID>

		You'll see 

		    log_info: attack_parser_re signatures reload requested
		    attack_parser_re_init(): atre config file path: "<config file>"
		    log_info: attack_parser_re_init(): Reloading attack_parser_re signatures
		    attack_parser_re_init(): NN attack signatures loaded


	    N.B.: atre-parser is not built by the sshguard build. You have go into
	    the attack_parser_re directory and build it separately.

	    The Makefile there uses the same flags as above, with the addition of

		ATRE_CONFIG_FILE_DIR=<directory of your choosing>

	    which you'd normally set to sshguard's config file path, in which case,
	    as above, a config file extension of either ".conf" or ".pcre" will be
	    expected, depending upon whether or not PCRE support was specified.


    Creating Parsing Regular Expressions

	Not going to get into Regular Expression syntax, here.  There's plenty of
	documentation on that elsewhere.  This is going to be a discussion on how
	they're specified and used with atre-parser and the attack_parser_re()
	function.

	The syntax is:

	    /regular expression with two captures/ <dangerousness>

	See either the attack_parser_re.conf (POSIX) or attack_parser_re.pcre
	(PCRE) files in the .../examples/ directory.

	The delimiting "/"s are required.

	Do not escape any intermediate "/"s.  The internal regexps that parse
	the config regexps ignore them.

	Exactly two capture groups must exist:

	    1. service identifier
	    2. IP address

	in that order.

	<dangerousness> is an integer value enclosed in literal angle-brackets.
	A value of <0> means use the default dangerousness value (defined
	in attack_parser_re.h).

	parse_line_re() will attempt to use the first captured sub-string to
	determine the service code for the captured service identifier.

	You can acquire a list of known service identifiers and their codes
	with

	    atre-parser -h services

	If service ident capture cannot be matched with a known service identifier,
	the "catch-all" service code of 0 ("all") will be used.  Thus: If you wish
	to capture a log entry that doesn't have a known service identifier,
	capture whatever sub-string makes sense to you.

	    Note: As of this writing I *still* haven't figured out how service
	    codes are used by sshguard.

	Notes:
	
	    When the "all" service code results, "reverse lookups" (service
	    code to service name) will return "all", rather than what was
	    captured.

	    "sshd" log line captures should be written

		/ (ssh)d.../ <...>

	     not

		/ (sshd).../ <...>

	PCRE builds allow an additional optimization over POSIX:  Anchored
	expressions of the form:

	    /^.{23} ...

	cause an offset to be applied to the compiled regex, thus avoiding
	them having to iterate over X number of leading characters.  Thus,
	for example, you can skip right past leading "date time hostname"
	bits, leading to a slight performance improvement.

	The regexp config reader has an IP address macro facility. Where it is
	desired to parse an IP address you can use any one of the following
	macros:

	    <IPV4_ADDR>
	    <IPV6_ADDR>
	    <IPV4_MAP6_ADDR>
	    <IPV_ANY_ADDR>

	These macros are expanded into a suitable regular expressions before
	RE compilation.

	The IPv4 macro expansion is rigorous, the others not so much.
	Particularly for IPv6 expressions. Their doing the right thing will
	depend upon the rigorousness of the rest of the regex you specify.