linux - How do I remove similar instances of lines using Unix commands?

05
2013-09
  • egidra

    I have a file that contains lines that look like the following:

    14|geauxtigers|90
    14|geauxtigers|null
    

    I want to remove all instances in the file with the null as the last term. Is there a way to do this with Unix commands?

    I was going to read in the file with Java and look at adjacent lines and remove the line whose adjacent line has similar first two terms but null as the third term. Is there a way to do this through Unix tools?

    Edit: I don't want to blindly remove all of the terms with null as the third term, I might have the following entry: 15|lsu|null I'd like to keep it since it is the only entry. It's just that, if there is another line with a third term that is non-null, I would like to keep the non-null value.

  • Answers
  • Kent

    I would like add one more answer, using awk:

    awk -F'|' '{if($3!="null"){a=$1;b=$2;print}else{if(a!=$1 || b!=$2)print}}' yourFile
    

    test

    kent$  echo "14|geauxtigers|90
    14|geauxtigers|null
    foo|bar|blah
    x|y|z
    x|y|null"|awk -F'|' '{if($3!="null"){a=$1;b=$2;print}else{if(a!=$1 || b!=$2)print}}'    
    14|geauxtigers|90
    foo|bar|blah
    x|y|z
    
  • ninjalj
    grep -v '|null$' yourfile.txt > filtered.txt
    
  • glenn jackman

    Assuming the lines might appear in any order, scan the file twice, first finding the non-null lines: I assume the "key" is the first two columns:

    awk -F '|' '
      NR == FNR  && $NF != "null" { notnull[$1 FS $2]; next }
      $NF == "null" && $1 FS $2 in notnull {next}
      {print} 
    ' filename filename > file.nonulls 
    

    If the null line always follows it's partner:

    awk -F '|' '
      $NF != null {seen[$1 FS $2]}
      $NF == "null" && $1 FS $2 in seen {next}
      {print}
    ' filename > file.nonulls 
    
  • Mnementh
    cat file | grep -v '|null$' > file2
    

    This pipes the file named file (you can fill in another name after the cat) through the grep-command which filter lines with patterns. The '-v' inverses the match, so all lines are matched, that have not the pattern. At last the result is put into file2.

  • Tim
    grep -Ev 'null' > newfile.with.nulls.removed
    
  • maerics

    Try using grep -v:

    grep -v '|null$' myfile.txt > myfile-fixed.txt
    
  • contact us

    Depending on your linux flavor, you can try something like:

    egrep -v '[|]null$' < file.in > file.out
    

  • Related Question

    linux - How can I recursively grep paticular files in a directory
  • Eric Wilson

    I'm new to linux and grep, and trying to find my way around.

    By using find -name *.java I am able to find the names of all of the java files in a particular directory. Suppose I want to count the number of times foo occurs in these files, how would I do that?

    I'be been trying things like:

    grep -r "foo" *.java
    

    and getting responses like:

    grep:  *.java:  No such file or directory
    

    Any ideas?


  • Related Answers
  • arathorn
    find . -name '*.java' | xargs grep <your pattern here>
    
  • user4126

    There is a tool specially designed for this type of need: ack.

    ack is a tool like grep, aimed at programmers with large trees of heterogeneous source code

    Also read the "Top 10 reasons to use ack instead of grep." at the ack page.

  • Manatok

    What about:

    grep -irn --include="*\.java" somePhrase *
    
  • wudeng

    find . -type f -name '*.java' -print0 | xargs -0 grep -wo 'foo' | wc -l