linux - Edit first line of large text file

06
2013-09
  • lynks

    I have a huge text file, far too big for the whole thing to be paged into memory. All I need to do with this text file is edit the first line (its a CSV file and I need to alter the titles).

    Is there a simple way I can do this in bash?

  • Answers
  • laurent

    You can use less to see what you want to edit and use sed to make the changes. This way you edit without loading the entire file.

    Another way is to split the file, edit and join again:

    split -b 10000k <file>
    
    and to join:
    
    cat xa* > <file>
    
  • Thor

    If your modification changes the length of the line, the whole file needs to be re-written, see for example this discussion on SO. You should probably consider saving the data to a database.

    Keeping that in mind, you can stream edit the file with sed. To replace the first line, do something like this (GNU sed):

    < oldfile sed '1c\new_heading' > newfile
    

  • Related Question

    linux - SQL like group by and sum for text files in command line?
  • dnkb

    I have huge text files with two fields, the first is a string the second is an integer. The files are sorted by the first field. What I'd like to get in the output is one line per unique string and the sum of the numbers for the identical strings. Some strings appear only once while other appear multiple times. E.g. Given the sample data below, for the string glehnia I'd like to get 10+22=32 in the result.

    Any suggestions how to do this either with gnuwin32 command line tools or in linux shell?

    Thanks!

    glehnia 10
    glehnia 22
    glehniae 343
    glehnii 923
    glei 1171
    glei 2283
    glei 3466
    gleib 914
    gleiber 652
    gleiberg 495
    gleiberg 709


  • Related Answers
  • Jukka Matilainen

    In AWK, you could do something like this:

    awk '($1 == last) || (last == "") {sum += $2} ($1 != last) && (last != "") {print last " " sum; sum = $2} {last = $1} END {print last " " sum}' huge_text_file.txt
    
  • Mike Fitzpatrick

    You could a use a few lines of Lua to acheive this. Lua is available on a wide range of platforms including Windows and Linux.

    -- Quick and dirty - no error checking, unsorted output
    
    io.input('huge_text_file.txt')
    
    results = {}
    
    for line in io.lines() do
        for text, number in string.gmatch(line, '(%w+)%s+(%d+)') do
            results[text] = (results[text] or 0) + number
        end
    end
    
    for text, number in pairs(results) do
        print(text, number)
    end
    

    You can sort the output using any sort utility or a few more lines of Lua.