linux - How do I remove similar instances of lines using Unix commands?
2013-09
I have a file that contains lines that look like the following:
14|geauxtigers|90
14|geauxtigers|null
I want to remove all instances in the file with the null as the last term. Is there a way to do this with Unix commands?
I was going to read in the file with Java and look at adjacent lines and remove the line whose adjacent line has similar first two terms but null as the third term. Is there a way to do this through Unix tools?
Edit: I don't want to blindly remove all of the terms with null as the third term, I might have the following entry: 15|lsu|null I'd like to keep it since it is the only entry. It's just that, if there is another line with a third term that is non-null, I would like to keep the non-null value.
I would like add one more answer, using awk:
awk -F'|' '{if($3!="null"){a=$1;b=$2;print}else{if(a!=$1 || b!=$2)print}}' yourFile
test
kent$ echo "14|geauxtigers|90
14|geauxtigers|null
foo|bar|blah
x|y|z
x|y|null"|awk -F'|' '{if($3!="null"){a=$1;b=$2;print}else{if(a!=$1 || b!=$2)print}}'
14|geauxtigers|90
foo|bar|blah
x|y|z
grep -v '|null$' yourfile.txt > filtered.txt
Assuming the lines might appear in any order, scan the file twice, first finding the non-null lines: I assume the "key" is the first two columns:
awk -F '|' '
NR == FNR && $NF != "null" { notnull[$1 FS $2]; next }
$NF == "null" && $1 FS $2 in notnull {next}
{print}
' filename filename > file.nonulls
If the null line always follows it's partner:
awk -F '|' '
$NF != null {seen[$1 FS $2]}
$NF == "null" && $1 FS $2 in seen {next}
{print}
' filename > file.nonulls
cat file | grep -v '|null$' > file2
This pipes the file named file (you can fill in another name after the cat) through the grep-command which filter lines with patterns. The '-v' inverses the match, so all lines are matched, that have not the pattern. At last the result is put into file2.
grep -Ev 'null' > newfile.with.nulls.removed
Try using grep -v
:
grep -v '|null$' myfile.txt > myfile-fixed.txt
Depending on your linux flavor, you can try something like:
egrep -v '[|]null$' < file.in > file.out
I'm new to linux and grep, and trying to find my way around.
By using find -name *.java
I am able to find the names of all of the java files in a particular directory. Suppose I want to count the number of times foo
occurs in these files, how would I do that?
I'be been trying things like:
grep -r "foo" *.java
and getting responses like:
grep: *.java: No such file or directory
Any ideas?
find . -name '*.java' | xargs grep <your pattern here>
What about:
grep -irn --include="*\.java" somePhrase *
find . -type f -name '*.java' -print0 | xargs -0 grep -wo 'foo' | wc -l