regex - sed: replace each occurence of 4 spaces (at beginning of a line) with 2 spaces

regex sed

25
2013-11

ifischer

Say I have this (puppet) file with an indentation of 4 spaces (I have a bunch of them that I have to process):

# init.pp
class hardwareid (
    $package_name      = $hardwareid::params::package_name,
    $package_category  = $hardwareid::params::package_category,
    $package_ensure    = $hardwareid::params::package_ensure

) inherits hardwareid::params {

    package { "${package_name}":
        name      => $package_name,
        category  => $package_category,
        ensure    => $package_ensure,
    }
}

I want to use sed to replace each occurence of 4 spaces at the beginning of a line with 2 spaces, to get this result:

class hardwareid (
  $package_name      = $hardwareid::params::package_name,
  $package_category  = $hardwareid::params::package_category,
  $package_ensure    = $hardwareid::params::package_ensure

) inherits hardwareid::params {

  package { "${package_name}":
    name      => $package_name,
    category  => $package_category,
    ensure    => $package_ensure,
  }
}

All I came up so far ist this:

sed -i -e 's/^\s\{4\}/  /g' init.pp

but this will not only replace occurences of 1x4 Spaces and therefore not include deeper indentations.

Is there a regex which can replace each 4xspace at beginning of a line with 2xspace? Is that even possible with simple regexes and sed, or do I have to switch to awk/perl/python/ruby, since I have to count the occurences to replace them with the same number?

EDIT

This question is stupid (although for a simple case, it all works). But I should not format my code without a tool that does understand the language of my code (which is Puppet). Even if I have the perfect regex (like provided inside the answers), I have the problem that if I accidentally apply the regex more than 1 time, the indentation is broken again. The Puppet guys are working on that issue: http://projects.puppetlabs.com/issues/8031 until it is solved, I have to be careful when converting files. Or write a real formatter myself (which should not be that hard).

Answers

glenn jackman

perl -pe 's{^((?:    )+)}{substr($&, length($&)/2)}e'

AaronM

You might be trying to use the wrong tool. It is possible that you could arrive at a sed solution but there is a tool built for this that will work much more quickly.

http://perltidy.sourceforge.net/

Take a look at perltidy. It is a perl module that is available in most distributions.

If it is installed correctly it can be invoked by typing 'perltidy'. To achieve what you are looking for the following should work.

perltidy -i=2 <filename>

This should create a new file with the changes with a .tdy extension. While perltidy was written initially for perl, it can and does work well on many other languages of code. It can be easily invoked within popular editors and used in conjunction with a common .tidyrc can be used to maintain/enforce a coding standard. There are extensive options available that will let you control every aspect of its treatment of the code.

AaronM

As the perltidy answer did not work on your code use this.

perl -pe 's{^(\s*)}{" " x (length($1)/2)}e'

Pass the name of the file at the end of the line or pipe the file into STDIN. STDOUT will be your modified code.

Related Answers

Peter Boughton

This regex will match what you want:

\r\n(?! )

So to use that with sed:

sed 's/\r\n(?! )/ /g' filename.rtf

Except, it appears that sed doesn't support negative lookahead, and requires backslashed parens, so you can instead use:

sed 's/\r\n\([^ ]\)/ \1/g' filename.rtf

Spidey

The solution lied in a tool I haven't given serious thought - awk

awk 'BEGIN { FS="\\\\par" } ; /^    / {print "\\par" $1} /^[^ ]/ {print " " $1}'

This will go over the file, with \par as the field seperator, and will print a \par before any line that starts with 4 spaces (which marks the beginning of a new paragraph), and remove (or simply won't print) it when it starts with anything but a space.

Now what we have is a file with \par only where legal line breaks should be. The next step would be to remove all newlines altogether, to get rid of rogue line breaks:

tr -d '\r\n'

And then feed the result to sed to replace \par with \par\r\n, practically adding a newline where a \par is.

sed 's/\\par/\\par\r\n/g'

And done.

The only real issue I've found with this method is that it ruined the RTF header. No problem, I just copied over the header from the original file.

Another smaller issue was that chapter titles were being printed inline with previous paragraphs. This is because chapter titles do not start with a space yet should be considered a paragraph. In my case, chapters were marked like so:

CHAPTER THIRTY-TWO
Chapter's Name

So a quick sed took care of them:

sed 's/\s*\(CHAPTER [[:upper:]-]* \)\(.*\\par\)/\\par\r\n\\par\r\n\\par\r\n\1\\par\r\n\2\\par\r\n/'

I now have my book in proper format, which makes it readable on other devices (such as my iPod).

Home

regex - sed: replace each occurence of 4 spaces (at beginning of a line) with 2 spaces