regex - Grep with ERE doesn't filter lines with -v option

06
2014-04
  • pmn

    I'm trying to use the extended regex option in grep to filter out from files, lines that have the following format of string at the beginning of the line.

    any-non-space-char:      *

    I'd assumed that the following command was going to do the trick; however, it just printed out all the lines from the 2 files that are picked-up by the wildcard.

    
    ~/tmp > cat * | grep -v -E "^\S+:.{6}\*"
    hi
    test1      blah, blah, blah:      * blah, blah, blah"
    test:      * blah, blah, blah:      * blah, blah, blah
    sd
    hi
    temp:      * blah, blah, blah:      * blah, blah, blah"
    temp2:     blah, blah, blah:      * blah, blah, blah
    sd
    ~/tmp >
    

    BTW, I alias grep to 'grep --color=auto', so the command does highlight the matching strings as per the regex correctly which are test: * on line 3 and temp: * on line 6 in the above output. Nonetheless, these matching lines get printed on the screen which I didn't expect.

    The contents of the two files:

    
    ~/tmp > ls -l
    total 8
    -rw-rw-r-- 1 pmn ccusers 116 Dec 11 09:22 1
    -rw-rw-r-- 1 pmn ccusers 116 Dec 11 09:23 2
    ~/tmp >
    
    ~/tmp > cat 1
    hi
    test1      blah, blah, blah:      * blah, blah, blah"
    test:      * blah, blah, blah:      * blah, blah, blah
    sd
    ~/tmp >
    
    ~/tmp > cat 2
    hi
    temp:      * blah, blah, blah:      * blah, blah, blah"
    temp2:     blah, blah, blah:      * blah, blah, blah
    sd
    ~/tmp >
    

    BTW, the following is similar to what I expect:

    
    ~/tmp > cat * | grep -v -E ":.{6}*"
    hi
    sd
    hi
    sd
    ~/tmp >
    

    Which removed the lines

    
    test1      blah, blah, blah:      * blah, blah, blah"
    test:      * blah, blah, blah:      * blah, blah, blah
    temp:      * blah, blah, blah:      * blah, blah, blah"
    temp2:     blah, blah, blah:      * blah, blah, blah
    

    (it also removed lines 1 and 4 above which is not what I want - hence this grep command won't work for me).

    I know how to get this to work on PERL; however, for certain reasons I can use only grep, awk or sed.

    How do I get this to work?


    @PsychoData

    Thanks for your response. I'm afraid the command did not do the trick. Your command returned the following

    ~/tmp > cat * | grep -v -E "^[^\S]+:.{6}\*"  
    hi  
    sd  
    hi  
    sd  
    ~/tmp >
    

    which is the same as the output returned by grep -v -E ":.{6}*" in my question, which, however, is not what I wanted. I wanted a command to bring the following output:

    hi  
    test1      blah, blah, blah:      * blah, blah, blah"  
    sd  
    hi  
    temp2:     blah, blah, blah:      * blah, blah, blah  
    sd
    

    IMHO, yours removed the following lines because ^[^\S]+: does a greedy-match, matching as much of the line as possible - which as you can see is until the right-most '*' in the following lines.

    test1      blah, blah, blah:      * blah, blah, blah"  
    test:      * blah, blah, blah:      * blah, blah, blah  
    temp:      * blah, blah, blah:      * blah, blah, blah"  
    temp2:     blah, blah, blah:      * blah, blah, blah
    

    BTW, please note that there are exactly 6 spaces between each : and * pair. I think the formatting makes this hard to notice.

  • Answers
  • PsychoData

    try grep -v -E "^[^\S]+:.{6}\*"

    Okay. So what I am doing with this is telling it that I want every line that does not contain the following pattern, and enabling extended expressions:

    match the start of a line, then [anything EXCEPT whitespace] at least once,then a colon, then 6 characters, then an asterisk
    

    anything that does not match that pattern will be shown

  • terdon

    There is no way of doing a non-greedy match in extended regular expressions. You can, however, easily do it with PCREs:

    $ grep -hvP "^[^\s]+?:\s+\*" *
    hi
    test1      blah, blah, blah:      * blah, blah, blah"
    sd
    hi
    temp2:     blah, blah, blah:      * blah, blah, blah
    sd
    

    You don't need to cat the files, grep can open them directly. The -h option turns of printing of the file name (necessary when not cating) and the -P turns on PCREs. You then search for one or more non-space characters at the beginning of the line ^[^\s]+?, followed by a :, one or more spaces (\s+) and finally a * (you need to escape the * else it is treated as a quantifier).


  • Related Question

    regex - What is the use of ? in grep command.. And practical use
  • RBA

    I tried to find the actual practical use of ? i.e. for e.g. "egrep a? filename" but was not able to find any.. It returns all results..

    So, Please help me out wherein i could know the actual use of egrep ? command..

    If i use 'a?', it returns all result i.e. strings or lines. which has 0 a's, 1 a's, 2 a's and so on.. i.e. i am not able to find the use of the same..

    Thanks


  • Related Answers
  • RichieHindle

    Say you wanted to match numeric assignment expressions like this in a script:

    x=1234
    

    where some numbers are negative and have a minus sign:

    x=-5678
    

    You could use this:

    grep -E "x=-?[0-9]+" *
    

    The question mark makes the minus optional.

    (I don't think plain grep supports ? or +, hence -E).

  • chaos

    It's a single character search, matching one or zero of the character before it.

    Note: you need to escape the ? by using a \ first: \?.

  • John Oxley

    In regex speak, a? means 0 or 1 'a's. So if you search for a string that has 0 or 1 a's in it, you'll get everything. A place where it would be useful is matching positive integers:

    /^\+?\d+$/
    

    which plays out as

    ^: beginning of line
    \+: + sign
    ?: 0 or 1 of previous character
    \d: digit
    +: one or more of previous character
    $: end of line
    

    and would match both +123 and 456

    Have a look at regular-expressions.info for more info on using regex's.