
Copyright (c) 2022 Jim Seymour (jseymour+sshguard@LinxNet.com)

atre/attack_parser_re project HowTo docs


This was all written off the top of my head.  It very well may contain
errors and omissions.  You have been warned!

It's as rough as it is in part because it's become something it was not set
out to be.  It was originally intended to be added to the sshguard project.


Subjects:

    . Building The Stand-Alone Parser
    . Building Directly Into SSHGuard
    . Creating Parsing Regular Expressions


Building The Stand-Alone Parser

    This one's pretty straight-forward

    Unpack the tarball into the directory in which you wish to build.

    The INCLUDES directive in Makefile will require an edit

	sshg_1.7.0_includes contains all the necessary include files
	(copied from sshguard-1.7.0).  They may or may not be they same
	as those in sshguard-2.4.2.  It's unimportant, as they're only
	used internally in the stand-alone executable.

    You may wish to change the ATRE_CONFIG_FILE_PATH base directory in
    attack_parser_re.h

    POSIX regex's version

	make

    Using pcreposix (PCRE/POSIX abstraction) lib

	make USE_PCRE=1

    Using "native" PCRE lib

	make USE_NATIVE_PCRE=1

    For PCRE builds you'll need either libpcreposix or libpcre, depending
    upon whether you specify USE_PCRE or USE_NATIVE_PCRE, respectively,
    along with their "-dev" packages.

    Either PCRE option produces much better performance than the POSIX lib.

    The difference between USE_PCRE and USE_NATIVE_PCRE is the latter uses
    native libpcre functions for compiling and executing expressions.
    This produces somewhat better performance, allows for more rigorous
    checking of expressions, and better detail of expression errors.

    For development work, add STAND_ALONE_DEVEL_ATRE=1.  (This mainly
    forceably disables syslogging.)

    attack_parser_re.out (output of tests) was generated with:

	./atre-parser -d4 -v4 -c test/attack_parser_re.conf <test/testfile
	    >attack_parser_re.out 2>&1

	    attack_parser_re.pcre - can use for PCRE versions

    Regression tests similar to sshguard-2.4.2 style regression tests

	./atre-parser -b -r -c examples/attack_parser_re.conf <test/mytests.txt

		attack_parser_re.pcre - can use for PCRE versions

    See the docs for sshguard-2.4.2 and atre-parser_doc.txt for instructions
    on how to replace sshguard's default parser with atre-parser.


Building Directly Into SSHGuard

    attack_parser_re() function call as an addition to, or replacement for,
    sshguard's native parser.  (I've been running it this way for weeks.)

    This is really only for the fairly code-savvy.

    extras/sshguard-1.7.0_integration_diffs.txt will give you hints on
    how do do this

    The Notes file included with the distribution tarball has running
    notes I made to myself when I first did this for my own sshguard build.
    They should be more-or-less accurate -ish.

    In brief (FSVO "brief"):

	. Copy attack_parser_re.c and attack_parser_re.h into sshguard's
	  source tree.	(Probably in a .../parser directory.)

	. In whatever file contains main() for the sshguard executable,
	  add this under the other include directives:

	      #include "parser/attack_parser_re.h"

	  (Assuming, of course, that's where you put the attack_parser_re
	   files.)

	. Find where all the other "init" functions are done and add a
	  call to the attack_parser_re_init() function.  Something like...

	      procauth_init();
	      whitelist_init();
	      attack_parser_re_init(NULL, NULL, sshg_debugging, 2, 1);

	  (That second-to-last arg to attack_parser_re_init() is the
	   logging verbosity level.  You may want something different there.)

	. Find the call to the existing parser routine in the sshguard
	  executable.  It'll look something like:

	    if(parse_line(source_id, buf, &parsed_attack) != 0) {

	  and change it to read something like:

	    if ((parse_line(source_id, buf, &parsed_attack) != 0) &&
		(parse_line_re(buf, &parsed_attack) != 0)) {

	. Find the various _fin() functions that are executed on program
	  termination and add one for attack_parser_re.  (Not absolutely
	  necessary, but it's good form.)

	      procauth_fin();
	      attack_parser_re_fin(NULL);
	      sshguard_log_fin();

	. If you want to be able to update regexp signatures on-the-fly,
	  to a running instance of sshguard, it gets a little trickier.

	    . Up near the top of sshguard's file that contains main(), add:

		/*
		 * Signal-tracking for thread-safe signal handler
		 */
		static volatile sig_atomic_t ts_got_signal = 0;
		static void ts_sig_handler(int signo);

	    . Just under where main() starts, where the other function
	      variables are declared:

	          struct sigaction sa;        /* for thread-safe signal handler */

	    . Somewhere near the beginning of main(), where the other
	      signal-catching is set up, add:

		memset(&sa, 0, sizeof(struct sigaction));
		sa.sa_handler = &ts_sig_handler;
		if (sigaction(SIGUSR1, &sa, NULL) == -1) {
		    perror("sigaction"); return EXIT_FAILURE;
		}

		I chose SIGUSR1.  You can change this if you like.  (N.B.: SIGHUP,
		SIGINT, and SIGTERM are [probably] already in-use by sshguard to
		terminate sshguard, so not those.)

	    . Just under where the line-parsing calls are (as above), add:

		if (ts_got_signal) {
		    ts_got_signal = 0;
		    sshguard_log(LOG_DEBUG, "Received signal to reload attack_parser_re signatures");
		    reload_attack_parser_re_conf();
		}

		This will cause the parsing regexps to be re-read on the next log line
		received.

	    . Add the signal-catching function somewhere handy (outside of main(), obviously):

		/*
		 * Thread-safe signal handler
		 */
		static void ts_sig_handler(int signo) {
		    ts_got_signal = signo;
		}

		All it does is set a flag to indicate a re-read is requested.

	    . Integration into sshg-parser is similar, except, of course,
	      there's no point to the signal-catching.

	    . Then you have to make a bunch of changes to the Makefile in order
	      to build the result.  I can't even begin to explain all those.  Refer
	      to the Notes doc and extras/sshguard-1.7.0_integration_diffs.txt
	      for hints on how do do this.

	      Note: I haven't yet modified the make file(s) in the sshguard
	      distribution to try to build atre-parser, as well.  Easier to
	      just build it separately.

	. Run "make" (or "make USE_PCRE=1" or "make USE_NATIVE_PCRE=1")
	  and cross your fingers.


Creating Parsing Regular Expressions

    Not going to get into Regular Expression syntax, here.  There's plenty of
    documentation on that elsewhere.  This is going to be a discussion on how
    they're specified and used with atre-parser and the attack_parser_re()
    function.

    The syntax is:

	/regular expression with two captures/ <dangerousness>

    See either the attack_parser_re.conf (POSIX) or attack_parser_re.pcre
    (PCRE) files in the .../examples/ directory.

    The delimiting "/"s are required.

    Do not escape any intermediate "/"s.  The internal regexps that parse
    the config regexps ignore them.

    The two captured sub-strings must be the service identifier and IP
    address, in that order.

    <dangerousness> is an integer value enclosed in literal angle-brackets.
    A value of <0> means use the default dangerousness value (defined
    in attack_parser_re.h).

    parse_line_re() will attempt to use the first captured sub-string to
    determine the service code for the captured service identifier.

    You can acquire a list of known service identifiers and their codes
    with

	atre-parser -h services

    If service ident capture cannot be matched with a known service identifier,
    the "catch-all" service code of 0 ("all") will be used.  Thus: If you wish
    to capture a log entry that doesn't have a known service identifier,
    capture whatever sub-string makes sense to you.

	Note: As of this writing I *still* haven't figured out how service
	codes are used by sshguard.

    Notes:
    
        When the "all" service code results, "reverse lookups" (service
        code to service name) will return "all", rather than what was
	captured.

	"sshd" log line captures should be written

	    / (ssh)d.../ <...>

         not

	    / (sshd).../ <...>

    Native PCRE builds (USE_NATIVE_PCRE) employ an additional optimization
    that neither POSIX nor PCRE/POSIX builds can.  Anchored expressions
    of the form:

	/^.{23} ...

    cause an offset to be applied to the compiled regex, thus avoiding
    them having to iterate over X number of leading characters.  Thus,
    for example, you can skip right past leading "date time hostname"
    bits, leading to a slight performance improvement.

    The regexp config reader has an IP address macro facility.	Where it
    is desired to parse an IP address you can use any one of the following
    macros:

	<IPV4_ADDR>
	<IPV6_ADDR>
	<IPV4_MAP6_ADDR>
	<IPV_ALL_ADDR>

    The macro will be expanded into a suitable regex before it's compiled.

    The IPv4 macro expansion is rigorous, the others not so much.
    Particularly for IPv6 expressions.	Their doing the right thing will
    depend upon the rigorousness of the rest of the regex you specify.

