linux - can sed remove every line that contains 'foo'?
2013-09
I read somewhere that:
sed -n -e '/foo/d' myinputfile.txt
would remove all occurences of 'foo' from myinputfile.txt.
However this does not seem to work for me. I am a sed noob and cannot seem to work this out. I am basically trying to run a bash script that calls sed on each line to remove a word from the input file and nothing happens when I run it.
Thanks :)
You read incorrectly.
However, while the sed expression itself is correct, the flags are not. sed normally outputs each line as it processes it to stdout, but -n
suppresses this. The end result is that no lines are output. You must remove the -n
if you want the proper output. You can then redirect this into another file, and then move that file into place.
Given a text file with space separated string and a tab separated integer, I'd like to get rid of all words that have non-alpha characters but keep words consisting of alpha only characters and the tab plus the integer afterwards.
My attempts like the ones below didin't yield any good. What I was trying to express is something like: "replace anything within word boundaries that starts and ends with 0 or more whatever and there is at least one :digits: or :punct: in between".
sed 's/\b.*[:digits::punct:]+.*\b//g'
sed 's/\b.*[^:alpha:]+.*\b//g'
What am I missing? See sample input data below.
Thank you!
Input:
asdf 754m 563
a2a 754mm 291
754n 463
754 ppp 1409
754pin 4652
pin pin 462
754pins 652
754 ppp </D> 1409
<D> 754pin 4652
pi$n pin 462
754/p ins 652
754 pp+p 1409
754 p=in 4652
Desired output:
asdf 563
291
463
ppp 1409
4652
pin pin 462
652
ppp 1409
4652
pin 462
ins 652
1409
4652
Basically this becomes a long list of things to delete:
sed -r 's/(^[[:digit:]]+\b|\b[[:digit:]]+[[:punct:]]*[[:alpha:]]+\b|\b[[:alpha:]]+[[:digit:]]+[[:alpha:]]+\b|\b[[:alpha:]]+[[:punct:]]+[[:alpha:]]+\b|[[:punct:]]+.*[[:punct:]]+)//g' file
Delete these:
- digits at the beginning of the line
- words that start with digits, may include punctuation, and end in alpha characters
- words that consist of alpha chars, followed by digits, followed by alpha
- words that consist of alpha, punct, alpha
- sequences that begin and end with punct chars
Wouldn't this best be solved with regular expressions?
([A-Z]+tab[0-9]+ ) or something like that
So if I understand correctly you want to keep words that have either all words or all digits. But nothing else, if so something like this should work:
(^|\s+)([A-Za-z]+|\d+)((?=\s)|(?=$))
(Use with the multiline flag)
When run over your example input it will find every input that is either all digits or all words. This is an easier solution compared to finding every word that doesn't match, however you can use this to extract the data as opposed to replacing the invalid data.