regex - Pattern matching gnmap fields with SED

25
2013-11
  • Ovid

    I am testing the regex needed for creating field extraction with Splunk for nmap and think I might be close...

    Example full line:

    Host: 10.0.0.1 (host)   Ports: 21/open|filtered/tcp//ftp///, 22/open/tcp//ssh//OpenSSH 5.9p1 Debian 5ubuntu1 (protocol 2.0)/, 23/closed/tcp//telnet///, 80/open/tcp//http//Apache httpd 2.2.22 ((Ubuntu))/,  10000/closed/tcp//snet-sensor-mgmt///  OS: Linux 2.6.32 - 3.2  Seq Index: 257  IP ID Seq: All zeros
    

    I've used underscore "_" as the delimiter because it makes it a little easier to read.

    root@host:/# sed -n -e 's_\([0-9]\{1,5\}\/[^/]*\/[^/]*\/\/[^/]*\/\/[^/]*\/.\)_\n\1_pg' filename
    

    The same regex with the escape characters removed:

    root@host:/# sed -n -e 's_\([0-9]\{1,5\}/[^/]*/[^/]*//[^/]*//[^/]*/.\)_\n\1_pg' filename
    

    Output:

    ... ... ...
    Host: 10.0.0.1 (host)   Ports: 
    21/open|filtered/tcp//ftp///, 
    22/open/tcp//ssh//OpenSSH 2.0p1 Debian 2ubuntu1 (protocol 2.0)/, 
    23/closed/tcp//telnet///, 
    80/open/tcp//http//Apache httpd 5.4.32 ((Ubuntu))/, 
    10000/closed/tcp//snet-sensor-mgmt///   OS: Linux 9.8.76 - 7.3  Seq Index: 257 IPID Seq: All zeros
    ... ... ...
    

    As you can see, the pattern matching appears to be working - although I am unable to:

    1 - match the pattern on both the end of line ( comma , and white/tabspace). The last line contains unwanted text (in this case, the OS and TCP timing info). A boolean "OR" for the two characters (comma and whitespace) seems not to match.

    ...(\,|\s)
    

    and

    2 - remove any of the un-necessary data - i.e. print only the matching pattern. It is actually printing the whole line. If i remove the sed -n flag, the remaining file contents are also printed. I can't seem to locate a way to only print the matched regex.

    i.e why, when I explicitly tell it not to, is sed printing these lines? =>

    Host: 10.0.0.1 (host) Ports:
    

    and

    OS: Linux 2.6.32 - 3.2  Seq Index: 257  IP ID Seq: All zeros
    

    Being fairly new to sed and regex, any help or pointers is greatly appreciated!

  • Answers
  • bonsaiviking

    First, I would encourage you to look into the XML output of Nmap (available with the -oX flag), which is the officially supported machine-readable output format. Greppable (-oG or .gnmap) output is deprecated, and so does not include helpful information from newer features of Nmap such as traceroute and NSE scripts.

    To answer your questions directly,

    1. the issue with matching either a comma or a space is causing errors because the alternation pipe character (|) must be escaped, not the comma. Also, you probably always want to match a whitespace character, but only sometimes the comma. This is how I would do that:

      ,\?\s
      

    I'm not using grouping, since there's no alternation ("or" pipe).

    1. sed is not printing "lines" that you don't want, it's printing the pattern space. The sed info page explains how sed works, and is a great reference for writing sed scripts. You essentially have 2 spaces to work with, and sed will print the entire contents of the pattern space when you use the p command.

    As an example of how you might go about this, here's my take on a sed script to print just the port information from a .gnmap file:

    #!/usr/bin/sed -n 
    
    #First, strip the beginning (Host and Ports labels) off
    s/.*Ports: //
    
    #Now match a port entry, consuming the optional comma and whitespace
    #The comma and whitespace are replaced with a newline
    s_\([0-9]\{1,5\}/[^/]*/[^/]*/[^/]*/[^/]*/[^/]*/[^/]*/\),\?\s_\1\n_
    
    #If we made a successful substitution, jump to :matched, 
    t matched
    #otherwise skip to the next input line
    d
    
    :matched
    #Print the pattern space up to the first newline
    P
    #Then delete up to the first newline and start over with what's left
    D
    

    All together in one line, that would look something like this:

    sed -n -e 's/.*Ports: //;s_\([0-9]\{1,5\}/[^/]*/[^/]*/[^/]*/[^/]*/[^/]*/[^/]*/\),\?\s_\1\n_;t matched;d;:matched;P;D' file.gnmap
    

    Note also, that you can't count on some of the fields in the port specification to always be empty. If version detection was done on a RPC service, for instance, the SunRPC info field will be populated.


  • Related Question

    osx - Sed: Deleting all content matching a pattern
  • Svish

    I have some plist files on Mac OS X that I would like to shrink. They have a lot of <dict> with <key> and values. One of these keys is a thumbnail which has a <data> value with base64 encoded binary (I think). I would like to remove this key and value.

    I was thinking this could maybe be done by sed, but I don't really know how to use it and it seems like sed only works on a line-by-line basis?

    Either way I was hoping someone could help me out. In the file I would like to delete everything that matches the following pattern or something close to that:

    <key>Thumbnail<\/key>[^<]*<\/data>
    

    In the file it looks like this:

                // Other keys and values
    
                <key>Thumbnail</key>
                <data>
                TU0AKgAAOEi25Pqx3/ip2fak0vOdzPCVxu2RweuPv+mLu+mIt+aGtuaEtOSB
    
                ...
    
                dCBBcHBsZSBDb21wdXRlciwgSW5jLiwgMjAwNQAAAAA=
                </data>
    
                // Other keys and values
    

    Anyone know how I could do this? Also, if there are any better tools that I can use in the terminal to do this, I would like to know about that as well :)


  • Related Answers
  • Chealion

    There are two utilities available on the command line specifically devoted to working with preference list files: defaults (in /usr/bin) and PlistBuddy in (/usr/libexec).

    Still using sed:

    sed does allow a multiple line delete using the D instead of the d flag.

    eg. sed -e '/<key>Thumbnail<\/key>/, /<\/data>/D' < /PATH/TO/FILE.txt removes all instances of the key Thumbnail and it's associated data.

    Using defaults:

    defaults delete /PATH/TO/PLIST "Thumbnail". Do not include the .plist extension as part of the path. Also, this will only work on root level items in a .plist, so if the Thumbnail key is inside another array or dict it won't work.

    Using PlistBuddy:

    /usr/libexec/PlistBuddy -c "Delete :Thumbnail" /PATH/TO/PLIST.plist. If the Thumbnail key is nested, you can append the path before it if you know it. eg. PlistBuddy -c "Delete :User:Thumbnail" if the Thumbnail entry was in a User dictionary.

  • Ignacio Vazquez-Abrams

    XMLStarlet is an awesome command line tool for manipulating XML. The main problems with it are 1) it's a very complex tool (since it does very complex jobs), and 2) you'd probably have to build it for OS X yourself.

  • user31894

    use awk, not sed

    $ cat file
    asdkf
    asdklf
                // Other keys and values
    
                <key>Thumbnail</key>
                <data>
                TU0AKgAAOEi25Pqx3/ip2fak0vOdzPCVxu2RweuPv+mLu+mIt+aGtuaEtOSB
    
                ...
    
                dCBBcHBsZSBDb21wdXRlciwgSW5jLiwgMjAwNQAAAAA=
                </data>
    
                // Other keys and values
    
    ksdf
    
    $ awk 'BEGIN{RS="</data>"} /<key>/{ gsub("<key>.*</key>|<data>.*","") }1' file
    asdkf
    asdklf
                // Other keys and values
    
    
    
    
    
                // Other keys and values
    
    ksdf
    

    the statement says, use </data> as record separator, then replace tags <key> and <data> with nothing, when <key> is found in the record

  • Chealion

    It might be possible to do this with sed, but it would be difficult. Perl could do this more easily. The guts of the perl script would be:

    undef $/; # This allows reading in of all lines in one swoop
    
    $contents = <>; # Read in contents of file (specified on command line)
    
    $contents = s{<key>Thumbnail</key>.*?</data>}{}s;
    
    print $contents;
    

    If you had the above in a perl script called change.plx and you had your data in a file called keyfile, then you could fix up that file by doing:

    $ perl change.plx keyfile > /tmp/$$ && cat /tmp/$$ > keyfile && rm /tmp/$$
    

    Of course, make sure you have a backup of any file that you do this to. It's possible to do all this work on multiple files with a single one line perl program on the command line like this:

    $ perl -p0777i -e 's{<key>Thumbnail</key>.*?</data>}{}s;' file file file ...
    

    Marnix