shell script - how to find number of substrings between delimiter?

07
2014-04
  • Dinesh Dabhi

    I have the string following

    something(1)^^^something(2)^^^something(3)^^^ ... ^^^something(n)

    how to find number of

    something(s)

    in the string.

  • Answers
  • MariusMatutiae

    This command will do for you:

     awk -F " " '{print NF}' filename
    

    and you can substitute your favorite field separatorfor the space. If you insist on using ^^^ as a field separator, then you should use

      awk -F '\\^\\^\\^'  "{print NF}" filename
    
  • terdon

    I can give you a better answer if you show your real data but assuming you are looking for the longest non-whitespace containing string that ends with (), you could do this:

    $ string="foo(bar)blah blah bob harry(baz) more stuff this(one) not that one ()"
    $ echo $string | grep -Po '[^\s]+\([^\)]+?\)' | wc -l
    3
    

    Explanation

    The -P flag for grep enables PCREs and the -o causes it to print each of the matching strings only and on a separate line.

    The regular expression matches:

    • [^\s]+ : as many non-whitespace characters as possible
    • \(: an opening parenthesis
    • [^\)]+?\) : as many non ) characters as possible until the first ).

    This would print:

    $ echo $string | grep -Po '[^\s]+\([^\)]+?\)' 
    foo(bar)
    harry(baz)
    this(one)
    

    You then pass that through wc -l to count the number of lines.


  • Related Question

    bash - how to use grep, sed, and awk to parse tags?
  • mechko

    I want to write a script that finds a open/close tag pair in a text file and prepends a fixed string to each line between the pair. I figure I use grep to find the tag line numbers and either awk or sed to place the tags, however, I'm not sure how exactly to do it.

    Can someone help?


  • Related Answers
  • mpez0

    In awk:

    START                  {noprefix="true"}
    /<close tag regex>/    {noprefix="true"}
    noprefix=="false"      {print "prefix", $0}
    noprefix=="true"       {print $0}
    /<open tag regex>/     {noprefix="false"}
    
  • 9tat

    It should be done by one of the traditionally syntax aware languages (yacc etc). Doing it with grep and the like may be okay for specific cases but regexp simply is not powerful enough to catch the subtleties of HTML

  • user18151

    You should consider using yacc for it. It is NOT possible to do this with sed, awk or grep without a considerable amount of effort. As for learning yacc, it wouldn't take more time than it did for learning sed/awk/grep. And it will be really easy that way.