linux - Format change data in file. Multiple sed commands?

linux regex sed

25
2013-11

Tommy

I'd like some advice on changing the formatting of some data in a file.

I have a large amount of data in a file. This is a output from a large Fortran program, it is formatted for a latex table. I wish to change the formatting but can not rerun the Fortran program with alternative formatting for the output. I've been playing with sed but not got very far.

A single line from a table is currently in the format

0.1 & 0.166685D+01 & 0.162768D+01 & 0.139468D+01 & 0.126904D+01 & 0.133247D+01 \\

I wish to change it to

0.1 & $0.16668510^{01}$ & $0.16276810^{01}$ & $0.13946810^{01}$ & $0.12690410^{01}$ & $0.133247 10^{01}$ \\

I currently have

#!/bin/bash

sed -i 's/D\+./ 10^{/g' $1

which gets me as far as

0.1 & 0.166685 10^{01 & 0.162768 10^{01 & 0.139468 10^{01 & 0.126904 10^{01 & 0.133247 10^{01 \\

but I still need to add the closing brace and wrap each number in a pair of `$' symbols.

In an ideal word I would also change the data to 3 d.p as well but this is less important.

Any sed / regex masters able to help, or maybe a suggestion of another tool which may be better suited to this problem.

Thanks

Tommy

I've just realised in the example all of the powers are +01 in this line, this is chance, they can be anything and vary though the file, positive and negative. Another example line with some NaNs thrown in.

0.3 & 0.634620D-02 & NaN & NaN & -0.312678D-02 & 0.192654D-03 \\

Answers

barbaz

And here is your sed expression:

sed -e 's/D+\([^ ]*\)/10^{\1}/g' -e 's/ \([^ &]*\) / $\1$ /g' -e 's/^/$/'

which reads as

s/D+\([^ ]*\)/10^{\1}/g

... substitute all occurrences of D+[word with no spaces] by 10^{[word with no spaces]}

s/ \([^ &]*\) / $\1$ /g

... substitute all occurrences of [space][word containing no spaces and &-chars][space] by [space]$[word containing no spaces and &-chars]$[space]

s/^/$/

... and prefix the line with a $-sign (which was not catched by the expression above)

Joe Internet

Here's a Perl one-liner that does the substitutions in 2 steps...

perl -pe ' s/D\+01/10\^{01}\$/g; s/\& /\& \$/g; ' < in.txt > out.txt

Edit...

Okay, based on your changed requirements...

perl -pe ' 

s/ \& NaN//g;               # removes <space>&<space>NaN sequences
s/D\+/10\+/g;               # replace D+ with 10+
s/D\-/10\-/g;               # replace D- with 10- 
s/\+/\^{/g;                 # replace +  with ^{ 
s/(?<! )\-/\^{-/g;          # replace -  with ^{- if preceding char is not a <space> 
s/(?<!\.[0-9]) \&/\} \&/g;  # replace <space>& with }<space>& if preceding chars are not .<single-digit> seq. 
s/ \\/\} \\/g;              # replace <space>\ with }<space>\

' < in.txt > out.txt

At this point and beyond, you should probably write a proper script, but this works with the sample data that you provided. You can copy & paste it into bash as is.

Ярослав Рахматуллин

Piece of cake. When will I have projects with ancient programs that produce latex? :(

$ cat tmp/latex-table 
echo '0.1 & 0.166685D+01 & 0.162768D+01 & 0.139468D+01 & 0.126904D-21 & 0.133247D+01 \\' |\
sed -e 's/&\([^0-9]*\)\([0-9\.]*\)D\([+\-]\)\([0-9]*\)/\&\1$\2 10^{0\3\4}$/g'
raptor: ~
$ bash tmp/latex-table 
0.1 & $0.166685 10^{0+01}$ & $0.162768 10^{0+01}$ & $0.139468 10^{0+01}$ & $0.126904 10^{0-21}$ & $0.133247 10^{0+01}$ \\

Related Answers

Peter Boughton

This regex will match what you want:

\r\n(?! )

So to use that with sed:

sed 's/\r\n(?! )/ /g' filename.rtf

Except, it appears that sed doesn't support negative lookahead, and requires backslashed parens, so you can instead use:

sed 's/\r\n\([^ ]\)/ \1/g' filename.rtf

Spidey

The solution lied in a tool I haven't given serious thought - awk

awk 'BEGIN { FS="\\\\par" } ; /^    / {print "\\par" $1} /^[^ ]/ {print " " $1}'

This will go over the file, with \par as the field seperator, and will print a \par before any line that starts with 4 spaces (which marks the beginning of a new paragraph), and remove (or simply won't print) it when it starts with anything but a space.

Now what we have is a file with \par only where legal line breaks should be. The next step would be to remove all newlines altogether, to get rid of rogue line breaks:

tr -d '\r\n'

And then feed the result to sed to replace \par with \par\r\n, practically adding a newline where a \par is.

sed 's/\\par/\\par\r\n/g'

And done.

The only real issue I've found with this method is that it ruined the RTF header. No problem, I just copied over the header from the original file.

Another smaller issue was that chapter titles were being printed inline with previous paragraphs. This is because chapter titles do not start with a space yet should be considered a paragraph. In my case, chapters were marked like so:

CHAPTER THIRTY-TWO
Chapter's Name

So a quick sed took care of them:

sed 's/\s*\(CHAPTER [[:upper:]-]* \)\(.*\\par\)/\\par\r\n\\par\r\n\\par\r\n\1\\par\r\n\2\\par\r\n/'

I now have my book in proper format, which makes it readable on other devices (such as my iPod).

Home

linux - Format change data in file. Multiple sed commands?