linux - Edit first line of large text file
2013-09
I have a huge text file, far too big for the whole thing to be paged into memory. All I need to do with this text file is edit the first line (its a CSV file and I need to alter the titles).
Is there a simple way I can do this in bash?
You can use less
to see what you want to edit and use sed
to make the changes. This way you edit without loading the entire file.
Another way is to split the file, edit and join again:
split -b 10000k <file>
and to join:
cat xa* > <file>
If your modification changes the length of the line, the whole file needs to be re-written, see for example this discussion on SO. You should probably consider saving the data to a database.
Keeping that in mind, you can stream edit the file with sed. To replace the first line, do something like this (GNU sed):
< oldfile sed '1c\new_heading' > newfile
I have huge text files with two fields, the first is a string the second is an integer. The files are sorted by the first field. What I'd like to get in the output is one line per unique string and the sum of the numbers for the identical strings. Some strings appear only once while other appear multiple times. E.g. Given the sample data below, for the string glehnia I'd like to get 10+22=32 in the result.
Any suggestions how to do this either with gnuwin32 command line tools or in linux shell?
Thanks!
glehnia 10
glehnia 22
glehniae 343
glehnii 923
glei 1171
glei 2283
glei 3466
gleib 914
gleiber 652
gleiberg 495
gleiberg 709
In AWK, you could do something like this:
awk '($1 == last) || (last == "") {sum += $2} ($1 != last) && (last != "") {print last " " sum; sum = $2} {last = $1} END {print last " " sum}' huge_text_file.txt
You could a use a few lines of Lua to acheive this. Lua is available on a wide range of platforms including Windows and Linux.
-- Quick and dirty - no error checking, unsorted output io.input('huge_text_file.txt') results = {} for line in io.lines() do for text, number in string.gmatch(line, '(%w+)%s+(%d+)') do results[text] = (results[text] or 0) + number end end for text, number in pairs(results) do print(text, number) end
You can sort the output using any sort utility or a few more lines of Lua.