regex - Pattern matching gnmap fields with SED

regex sed nmap

25
2013-11

Ovid

I am testing the regex needed for creating field extraction with Splunk for nmap and think I might be close...

Example full line:

Host: 10.0.0.1 (host)   Ports: 21/open|filtered/tcp//ftp///, 22/open/tcp//ssh//OpenSSH 5.9p1 Debian 5ubuntu1 (protocol 2.0)/, 23/closed/tcp//telnet///, 80/open/tcp//http//Apache httpd 2.2.22 ((Ubuntu))/,  10000/closed/tcp//snet-sensor-mgmt///  OS: Linux 2.6.32 - 3.2  Seq Index: 257  IP ID Seq: All zeros

I've used underscore "_" as the delimiter because it makes it a little easier to read.

root@host:/# sed -n -e 's_\([0-9]\{1,5\}\/[^/]*\/[^/]*\/\/[^/]*\/\/[^/]*\/.\)_\n\1_pg' filename

The same regex with the escape characters removed:

root@host:/# sed -n -e 's_\([0-9]\{1,5\}/[^/]*/[^/]*//[^/]*//[^/]*/.\)_\n\1_pg' filename

Output:

... ... ...
Host: 10.0.0.1 (host)   Ports: 
21/open|filtered/tcp//ftp///, 
22/open/tcp//ssh//OpenSSH 2.0p1 Debian 2ubuntu1 (protocol 2.0)/, 
23/closed/tcp//telnet///, 
80/open/tcp//http//Apache httpd 5.4.32 ((Ubuntu))/, 
10000/closed/tcp//snet-sensor-mgmt///   OS: Linux 9.8.76 - 7.3  Seq Index: 257 IPID Seq: All zeros
... ... ...

As you can see, the pattern matching appears to be working - although I am unable to:

1 - match the pattern on both the end of line ( comma , and white/tabspace). The last line contains unwanted text (in this case, the OS and TCP timing info). A boolean "OR" for the two characters (comma and whitespace) seems not to match.

...(\,|\s)

and

2 - remove any of the un-necessary data - i.e. print only the matching pattern. It is actually printing the whole line. If i remove the sed -n flag, the remaining file contents are also printed. I can't seem to locate a way to only print the matched regex.

i.e why, when I explicitly tell it not to, is sed printing these lines? =>

Host: 10.0.0.1 (host) Ports:

and

OS: Linux 2.6.32 - 3.2  Seq Index: 257  IP ID Seq: All zeros

Being fairly new to sed and regex, any help or pointers is greatly appreciated!

Answers

bonsaiviking

First, I would encourage you to look into the XML output of Nmap (available with the -oX flag), which is the officially supported machine-readable output format. Greppable (-oG or .gnmap) output is deprecated, and so does not include helpful information from newer features of Nmap such as traceroute and NSE scripts.

To answer your questions directly,

the issue with matching either a comma or a space is causing errors because the alternation pipe character (|) must be escaped, not the comma. Also, you probably always want to match a whitespace character, but only sometimes the comma. This is how I would do that:
```
,\?\s
```

I'm not using grouping, since there's no alternation ("or" pipe).

sed is not printing "lines" that you don't want, it's printing the pattern space. The sed info page explains how sed works, and is a great reference for writing sed scripts. You essentially have 2 spaces to work with, and sed will print the entire contents of the pattern space when you use the p command.

As an example of how you might go about this, here's my take on a sed script to print just the port information from a .gnmap file:

#!/usr/bin/sed -n 

#First, strip the beginning (Host and Ports labels) off
s/.*Ports: //

#Now match a port entry, consuming the optional comma and whitespace
#The comma and whitespace are replaced with a newline
s_\([0-9]\{1,5\}/[^/]*/[^/]*/[^/]*/[^/]*/[^/]*/[^/]*/\),\?\s_\1\n_

#If we made a successful substitution, jump to :matched, 
t matched
#otherwise skip to the next input line
d

:matched
#Print the pattern space up to the first newline
P
#Then delete up to the first newline and start over with what's left
D

All together in one line, that would look something like this:

sed -n -e 's/.*Ports: //;s_\([0-9]\{1,5\}/[^/]*/[^/]*/[^/]*/[^/]*/[^/]*/[^/]*/\),\?\s_\1\n_;t matched;d;:matched;P;D' file.gnmap

Note also, that you can't count on some of the fields in the port specification to always be empty. If version detection was done on a RPC service, for instance, the SunRPC info field will be populated.

Related Answers

Chealion

There are two utilities available on the command line specifically devoted to working with preference list files: defaults (in /usr/bin) and PlistBuddy in (/usr/libexec).

Still using sed:

sed does allow a multiple line delete using the D instead of the d flag.

eg. sed -e '/<key>Thumbnail<\/key>/, /<\/data>/D' < /PATH/TO/FILE.txt removes all instances of the key Thumbnail and it's associated data.

Using defaults:

defaults delete /PATH/TO/PLIST "Thumbnail". Do not include the .plist extension as part of the path. Also, this will only work on root level items in a .plist, so if the Thumbnail key is inside another array or dict it won't work.

Using PlistBuddy:

/usr/libexec/PlistBuddy -c "Delete :Thumbnail" /PATH/TO/PLIST.plist. If the Thumbnail key is nested, you can append the path before it if you know it. eg. PlistBuddy -c "Delete :User:Thumbnail" if the Thumbnail entry was in a User dictionary.

Ignacio Vazquez-Abrams

XMLStarlet is an awesome command line tool for manipulating XML. The main problems with it are 1) it's a very complex tool (since it does very complex jobs), and 2) you'd probably have to build it for OS X yourself.

user31894

use awk, not sed

$ cat file
asdkf
asdklf
            // Other keys and values

            <key>Thumbnail</key>
            <data>
            TU0AKgAAOEi25Pqx3/ip2fak0vOdzPCVxu2RweuPv+mLu+mIt+aGtuaEtOSB

            ...

            dCBBcHBsZSBDb21wdXRlciwgSW5jLiwgMjAwNQAAAAA=
            </data>

            // Other keys and values

ksdf

$ awk 'BEGIN{RS="</data>"} /<key>/{ gsub("<key>.*</key>|<data>.*","") }1' file
asdkf
asdklf
            // Other keys and values





            // Other keys and values

ksdf

the statement says, use </data> as record separator, then replace tags <key> and <data> with nothing, when <key> is found in the record

Chealion

It might be possible to do this with sed, but it would be difficult. Perl could do this more easily. The guts of the perl script would be:

undef $/; # This allows reading in of all lines in one swoop

$contents = <>; # Read in contents of file (specified on command line)

$contents = s{<key>Thumbnail</key>.*?</data>}{}s;

print $contents;

If you had the above in a perl script called change.plx and you had your data in a file called keyfile, then you could fix up that file by doing:

$ perl change.plx keyfile > /tmp/$$ && cat /tmp/$$ > keyfile && rm /tmp/$$

Of course, make sure you have a backup of any file that you do this to. It's possible to do all this work on multiple files with a single one line perl program on the command line like this:

$ perl -p0777i -e 's{<key>Thumbnail</key>.*?</data>}{}s;' file file file ...

Marnix

Home

regex - Pattern matching gnmap fields with SED