regex - How to use sed to replace a pattern at the end of each line in a file with fixed text?

25
2013-11
  • WilliamKF

    I want to compare two files of around 40 MB of comma separated values with lines like this:

    hstar,default,"T9883Z ",0d59,c801,7332,5,20120914,4.343618767

    For the two files, the last entry which is 4.343618767 in the above example varies between the two files, but almost all the other fields match identically.

    I need to diff the two files to locate the few places where the entries other than the last vary between the two files.

    I'm thinking the easiest way to do this is to use SED to process the two files and normalize the last field, looking for the number pattern after the seventh comma and replacing it with a fixed string like 9.999999999 on every line and then a simple diff will work.

    However, I'm not sure how to construct a sed command to locate the seventh comma and replace the remaining string to the end of the line with a fixed string. What would such a sed command look like? I imagine I would need to use a regular expression but am not sure how to start the pattern after the seventh comma.

  • Answers
  • choroba

    You do not have to look for the seventh column. Just go for the last one:

    sed 's/,[^,]*$/,9.9999999999/'
    

    Explanation:

    ,    match the comma
    [    beginning of a character group
     ^   negation, i.e. do not match the following characters
     ,   comma
    ]    end of a character group
    *    repeat the preceding thing zero or more times
    $    match the end of line
    
  • m4573r

    sed "s/,[0-9].[0-9]\+\$//" <yourfile> will output lines like this:

    hstar,default,"T9883Z ",0d59,c801,7332,5,20120914


  • Related Question

    text - using sed to remove lines in a file
  • eleven81

    I have a file that looks something like this:

    Heading - 
      - Completed foo
        - More information
        - Still more
      * Need to complete bar
      - Did baz (comment blah blah) ***
    
    Another - 
      * Need to complete foo
      - Completed bar (blah comment blah) ***
      - Done baz
    

    I need to run the text file through sed to remove all of the lines that start with spaces (number varies) and a hyphen, and another space.

    What is the regex or pattern I need to use with sed to make the output look like this below?

    Heading - 
      * Need to complete bar
    
    Another - 
      * Need to complete foo
    

  • Related Answers
  • eleven81

    I used Phoshi's answer, assisted by Dennis Williamson, to help me come up with sed /^\s+-\s.*/d which works as expected.

  • Phoshi

    "s/\s*-\s.*//g" should do it, I think.

    That's \s to match a space, * to match zero or more of the preceding character (the space), a literal hyphen character, then another space, then .+ to match everything after it.

  • Ryan Thompson

    You should use egrep or grep for this task, sed is a stream editor, grep is more in line with the line-at-a-time philosophy.

    You need a regex that matches the start of line, whitespace, hyphen, space. Sounds like this would work:

    egrep  -v  '^[ ]+-[ ]' filename
    

    The -v option causes egrep to REMOVE the matching lines -- this is easier than building a regex that rejects the lines.

    Example:

     nobody$ egrep -v  '^[ ]+-[ ]' /tmp/foof
     Heading - 
       * Need to complete bar
    
     Another - 
       * Need to complete foo
     nobody$ cat /tmp/foof
     Heading - 
       - Completed foo
         - More information
         - Still more
       * Need to complete bar
       - Did baz (comment blah blah) ***
    
     Another - 
       * Need to complete foo
       - Completed bar (blah comment blah) ***
       - Done baz
     nobody$ _
    

    Dealing with Tab characters only means you need them in the bracket expressions,but that's hard to show online.