linux - can sed remove every line that contains 'foo'?

05
2013-09
  • isildur

    I read somewhere that:

    sed -n -e '/foo/d' myinputfile.txt
    

    would remove all occurences of 'foo' from myinputfile.txt.

    However this does not seem to work for me. I am a sed noob and cannot seem to work this out. I am basically trying to run a bash script that calls sed on each line to remove a word from the input file and nothing happens when I run it.

    Thanks :)

  • Answers
  • Ignacio Vazquez-Abrams

    You read incorrectly.

    However, while the sed expression itself is correct, the flags are not. sed normally outputs each line as it processes it to stdout, but -n suppresses this. The end result is that no lines are output. You must remove the -n if you want the proper output. You can then redirect this into another file, and then move that file into place.


  • Related Question

    command line - remove words containing non-alpha characters
  • dnkb

    Given a text file with space separated string and a tab separated integer, I'd like to get rid of all words that have non-alpha characters but keep words consisting of alpha only characters and the tab plus the integer afterwards.

    My attempts like the ones below didin't yield any good. What I was trying to express is something like: "replace anything within word boundaries that starts and ends with 0 or more whatever and there is at least one :digits: or :punct: in between".

    sed 's/\b.*[:digits::punct:]+.*\b//g'
    sed 's/\b.*[^:alpha:]+.*\b//g'
    

    What am I missing? See sample input data below.

    Thank you!

    Input:

    asdf 754m   563  
    a2a 754mm   291  
    754n    463  
    754 ppp 1409  
    754pin  4652  
    pin pin 462  
    754pins 652  
    754 ppp </D>    1409  
    <D> 754pin  4652  
    pi$n pin    462  
    754/p ins   652  
    754 pp+p    1409  
    754 p=in    4652  
    

    Desired output:

    asdf    563  
        291  
        463  
    ppp 1409  
        4652  
    pin pin 462  
        652  
     ppp    1409  
        4652  
     pin    462  
     ins    652  
        1409  
        4652  
    

  • Related Answers
  • Dennis Williamson

    Basically this becomes a long list of things to delete:

    sed -r 's/(^[[:digit:]]+\b|\b[[:digit:]]+[[:punct:]]*[[:alpha:]]+\b|\b[[:alpha:]]+[[:digit:]]+[[:alpha:]]+\b|\b[[:alpha:]]+[[:punct:]]+[[:alpha:]]+\b|[[:punct:]]+.*[[:punct:]]+)//g' file
    

    Delete these:

    • digits at the beginning of the line
    • words that start with digits, may include punctuation, and end in alpha characters
    • words that consist of alpha chars, followed by digits, followed by alpha
    • words that consist of alpha, punct, alpha
    • sequences that begin and end with punct chars
  • Daisetsu

    Wouldn't this best be solved with regular expressions?

    ([A-Z]+tab[0-9]+ ) or something like that

  • Area 51

    So if I understand correctly you want to keep words that have either all words or all digits. But nothing else, if so something like this should work:

    (^|\s+)([A-Za-z]+|\d+)((?=\s)|(?=$))
    

    (Use with the multiline flag)

    When run over your example input it will find every input that is either all digits or all words. This is an easier solution compared to finding every word that doesn't match, however you can use this to extract the data as opposed to replacing the invalid data.