regex - Grep with ERE doesn't filter lines with -v option
2014-04
I'm trying to use the extended regex option in grep to filter out from files, lines that have the following format of string at the beginning of the line.
any-non-space-char: *
I'd assumed that the following command was going to do the trick; however, it just printed out all the lines from the 2 files that are picked-up by the wildcard.
~/tmp > cat * | grep -v -E "^\S+:.{6}\*"
hi
test1 blah, blah, blah: * blah, blah, blah"
test: * blah, blah, blah: * blah, blah, blah
sd
hi
temp: * blah, blah, blah: * blah, blah, blah"
temp2: blah, blah, blah: * blah, blah, blah
sd
~/tmp >
BTW, I alias grep to 'grep --color=auto'
, so the command does highlight the matching strings as per the regex correctly which are test: *
on line 3 and temp: *
on line 6 in the above output. Nonetheless, these matching lines get printed on the screen which I didn't expect.
The contents of the two files:
~/tmp > ls -l
total 8
-rw-rw-r-- 1 pmn ccusers 116 Dec 11 09:22 1
-rw-rw-r-- 1 pmn ccusers 116 Dec 11 09:23 2
~/tmp >
~/tmp > cat 1
hi
test1 blah, blah, blah: * blah, blah, blah"
test: * blah, blah, blah: * blah, blah, blah
sd
~/tmp >
~/tmp > cat 2
hi
temp: * blah, blah, blah: * blah, blah, blah"
temp2: blah, blah, blah: * blah, blah, blah
sd
~/tmp >
BTW, the following is similar to what I expect:
~/tmp > cat * | grep -v -E ":.{6}*"
hi
sd
hi
sd
~/tmp >
Which removed the lines
test1 blah, blah, blah: * blah, blah, blah"
test: * blah, blah, blah: * blah, blah, blah
temp: * blah, blah, blah: * blah, blah, blah"
temp2: blah, blah, blah: * blah, blah, blah
(it also removed lines 1 and 4 above which is not what I want - hence this grep command won't work for me).
I know how to get this to work on PERL; however, for certain reasons I can use only grep, awk or sed.
How do I get this to work?
@PsychoData
Thanks for your response. I'm afraid the command did not do the trick. Your command returned the following
~/tmp > cat * | grep -v -E "^[^\S]+:.{6}\*"
hi
sd
hi
sd
~/tmp >
which is the same as the output returned by grep -v -E ":.{6}*"
in my question, which, however, is not what I wanted. I wanted a command to bring the following output:
hi
test1 blah, blah, blah: * blah, blah, blah"
sd
hi
temp2: blah, blah, blah: * blah, blah, blah
sd
IMHO, yours removed the following lines because ^[^\S]+:
does a greedy-match, matching as much of the line as possible - which as you can see is until the right-most '*
' in the following lines.
test1 blah, blah, blah: * blah, blah, blah"
test: * blah, blah, blah: * blah, blah, blah
temp: * blah, blah, blah: * blah, blah, blah"
temp2: blah, blah, blah: * blah, blah, blah
BTW, please note that there are exactly 6 spaces between each :
and *
pair. I think the formatting makes this hard to notice.
try grep -v -E "^[^\S]+:.{6}\*"
Okay. So what I am doing with this is telling it that I want every line that does not contain the following pattern, and enabling extended expressions:
match the start of a line, then [anything EXCEPT whitespace] at least once,then a colon, then 6 characters, then an asterisk
anything that does not match that pattern will be shown
There is no way of doing a non-greedy match in extended regular expressions. You can, however, easily do it with PCREs:
$ grep -hvP "^[^\s]+?:\s+\*" *
hi
test1 blah, blah, blah: * blah, blah, blah"
sd
hi
temp2: blah, blah, blah: * blah, blah, blah
sd
You don't need to cat
the files, grep
can open them directly. The -h
option turns of printing of the file name (necessary when not cat
ing) and the -P
turns on PCREs. You then search for one or more non-space characters at the beginning of the line ^[^\s]+?
, followed by a :
, one or more spaces (\s+
) and finally a *
(you need to escape the *
else it is treated as a quantifier).
I tried to find the actual practical use of ? i.e. for e.g. "egrep a? filename" but was not able to find any.. It returns all results..
So, Please help me out wherein i could know the actual use of egrep ? command..
If i use 'a?', it returns all result i.e. strings or lines. which has 0 a's, 1 a's, 2 a's and so on.. i.e. i am not able to find the use of the same..
Thanks
Say you wanted to match numeric assignment expressions like this in a script:
x=1234
where some numbers are negative and have a minus sign:
x=-5678
You could use this:
grep -E "x=-?[0-9]+" *
The question mark makes the minus optional.
(I don't think plain grep supports ?
or +
, hence -E
).
It's a single character search, matching one or zero of the character before it.
Note: you need to escape the ? by using a \ first: \?.
In regex speak, a? means 0 or 1 'a's. So if you search for a string that has 0 or 1 a's in it, you'll get everything. A place where it would be useful is matching positive integers:
/^\+?\d+$/
which plays out as
^: beginning of line
\+: + sign
?: 0 or 1 of previous character
\d: digit
+: one or more of previous character
$: end of line
and would match both +123
and 456
Have a look at regular-expressions.info for more info on using regex's.