regex - How do you get SED to replace only the first set of characters in a search expression

25
2013-11
  • Greg

    Basically I think this question is more one of regex than SED directly, but it’s SED I’m using, so that’s where I’m starting from.

    I’ve got a web.config configuration file for an ASP.NET 2.0 website that I’m automating the build process for. I’m using UNIXUtils to give me some extra power over and above that provided by Windows. Using SED can someone enlighten me as to how I'd accomplish the following;

    Change

    Data Source=RANDOM-SERVER;Initial Catalog=SomeDatabase;Persist Security Info=True;User ID=MyUser;Password=*************;
    

    To

    Data Source=PRODUCTION-SERVER;Initial Catalog=SomeDatabase;Persist Security Info=True;User ID=MyUser;Password=*************;
    

    I’ve tried the following

    sed s"|Data Source=.*;|Data Source=PRODUCTION-SERVER;|g"
    

    but what I receive is

    Data Source=PRODUCTION-SERVER;
    

    i.e. sed is helpfully being ultra greedy and eating all the characters up until the last semicolon, at least that’s what it seems to be doing.

    BTW: Yes I would like to retain the |g greedy operator on the end to ensure that the alteration is applied to all occurrences in the original file, though, in theory there shouldn’t be.

    If someone can show me how to accomplish this I'd be grateful, though I suspect there isn’t an easy way to do it, as I’ve not been able to find anything on the net that matches my requirements. Annoyingly everybody's focused on paths of one form or another.

    If it can’t be done in SED but can be done using something else in the standard UNIXUtils suite, then that’s an acceptable answer, I’m open to suggestions.

  • Answers
  • Thor

    Use a non-matching character-group ([^;]) instead of dot (.), for example:

    sed s"|Data Source=[^;]*|Data Source=PRODUCTION-SERVER|"
    

    Note that global (g) flag at the end means "replace all occurrences on the current line" and not greedy.

  • terdon

    As far as I know, sed has no way of doing non greedy matching, its regexes are always greedy (and as already pointed out the g flag means match all occurrences in the current line, it will search the entire file by default).

    I would do this either by using a character group as suggested by @Thor or by using a tool that supports non greedy matches like Perl:

    perl -pne 's/Data Source=.+?;/Data Source=PRODUCTION-SERVER;/' file 
    

    In Perl regex, the construct +? means find at least one (+) but as few as possible (?) matching characters. The -p means print each line and the -n means read the input file line by line and apply the script given as an argument to -e. Finally, you can also edit the file in place with the i option:

    perl -i -pne 's/Data Source=.+?;/Data Source=PRODUCTION-SERVER;/' file 
    

  • Related Question

    command line - search and replace in sed with multiline pattern
  • Toc

    I have a file whose content is as follows:

    alfa
    [many lines here]
    TAG1
    TAG2
    
    bravo
    TAG3
    
    charlie
    TAG4
    [many lines here]
    

    where TAG1, TAG2, TAG3 and TAG4 are fixed strings and alfa, bravo and charlie change time to time, and I want to extract:

    alfa-bravo-charlie
    

    What is the precise sed command I have to use? I do not know how to work with multi-line pattern. :(

    P.S.: I'm using sed for windows.


  • Related Answers
  • dubiousjim

    This works with gnu sed, I don't think it relies on any gnu-specific extensions but I don't know.

    echo "$yourdata" | sed -ne '1{h;d}; /^TAG1$/ {n; /^TAG2$/{n;N;N; /\nTAG3$/ {s///; H; n;N;N; /\nTAG4$/ {s///; H; g; s/\n\n/-/gp; q; } } } }'
    

    Result: alfa-bravo-charlie

    How does it work? First we tell sed "-n" we want not to print anything unless we specifically say [p]rint.

    The first block of the sed expression is "1{h;d}". This says when we read line 1, stash that line in the [h]old buffer then [d]elete it from the working buffer so that we'll read the next line and pass it through the sed expression from the start.

    When reading subsequent lines the "1{...}" block will be skipped.

    We don't match anything further until we hit the line TAG1. At this point we execute the long {...} block. This says first read the [n]ext line, overwriting the TAG1 line which was in the buffer. If the buffer now is TAG2, then we execute the next inner {...} block. That first reads the [n]ext line, overwriting what's already in the buffer. The next two commands are "N;N". This means read the next 2 lines but append them to the work buffer, rather than overwriting it. If the work buffer now matches /\nTAG3$/, then we execute the next inner {...} block. That says first "s///", in other words substitute the empty string for the most-recently matched expression. This deletes the "\nTAG3" from the end of the working buffer, leaving "\nbravo". Then we do [H], which appends that to the hold buffer. ([h] overwrites the hold buffer, [H] appends to it). So now the hold buffer contains the first line "alfa", then the next line "\nbravo". These are joined by a newline, so we've really got "alfa\n\nbravo." We'll take care of the two newlines later.

    We keep going until we've got "alfa\n\nbravo\n\ncharly" in the hold buffer. Then we say [g]et the hold buffer (overwriting whatever is in the working buffer). We do a "s/\n\n/-/" on this to turn the double newlines into dashes. We add "g" and "p" flags to the end of the [s] command so that the substitution works globally (i.e. doesn't just do one substitution then stop) and that the result after substitution is [p]rinted.

    Then we [q]uit, we don't need to read the rest of the input stream.

  • larsks

    It's not clear from your example exactly what you're trying to do. It sounds like you're trying to discard the entire contents of the file other than a set of three markers, which you want to join together. You don't need sed for this, you can just type:

    echo alfa-bravo-charlie
    

    And you've accomplished your goal. If you simply want to remove the content between "alfa" and "charlie", you could use a sed script like this:

    /charlie/ a\
    alfa-bravo-charlie
    /alfa/,/charlie/ d
    

    If this isn't what you want to do, it might help if you were to clarify your example.