linux tar -T - does not work on the fly

07
2014-07
  • Znik

    I found some problem with linux gnu tar. wheh I use option

    -T -  (for file list from stdin) or
    -T named_pipe_file    ,
    

    this desn't work on the fly. for example, simple interactive script:

    while read x; do echo $x; done|\
    tar cvf tar.tar -T -
    

    tar starts archiving only when I press ^D for marking input EOF the same situation is when I use named pipe:

    mkfifo named_pipe
    tar cvf tar.tar -T named_pipe
    while read x; do echo $x; done >named_pipe
    

    It seems tar makes some buffering. But how long is it? I must repack a lot of files to TAR but have little disk space. Then I must do this on the fly. I waht use tar option --remove-files for this. But without interactivity for -T option it's impossobie. In the plan, "while" part of code should unpack file to file sequently and waiting for TAR for removing, and next file. Thanks for ideas :)

    my tar version: tar (GNU tar) 1.26 (C) 2011 FSF

  • Answers
  • wingedsubmariner

    tar is able to append to already existing archives, so you could do:

    touch tarfile.tar
    command_that_produces_file_list | xargs tar rf tarfile.tar
    

    Unfortunately, this doesn't work with on the fly compression. Luckily, the tar format is simple enough we can do some hacking:

    command_that_produces_file_list | {
      xargs -i sh -c 'tar c {} | head -c $(( (`stat --printf="%s" {}` + 511) / 512 * 512 + 512))';
      dd if=/dev/zero bs=512 count=2 2>/dev/null;
    } | compression_utility
    

    tar output consists of, for each file, a 512-byte header followed by enough 512-byte blocks to hold the file data. It then appends at least 2 512-byte blocks of zeros. What this code does is capture the output of tar and remove the extra blocks of zeros, combine the output from the multiple invocations of tar together, and then sticks on the terminating blocks of zeros. The output is sent down the pipe to the compression utility, which runs concurrently with the tars.

  • Znik

    Good news. I get answer for my bug report to [email protected] , cite:

    From: Sergey Poznyakoff date:
    Thu, 05 Sep 2013 08:40:40 +0300 subject: Re: [Bug-tar] gnu tar, option -T from stdin or named pipe is not interactive

    Hi Grzegorz,

    This has been fixed in the git HEAD (starting from commit 1fe0c83d).

    Regards, Sergey

    Then I'm waiting when this will be fixed in linux distros :)

  • Ярослав Рахматуллин

    Read this explanation (first answer): In what order do piped commands run?

    What you see is tar blocking for the completion of the input list before it starts processing. Arguably, doing the processing in parallel with the input, one-by-one could be useful, but I don't think GNU Tar supports that.

    I can only guess that waiting for the whole list is done to avoid complexity in the "internal procedures" of handling command line arguments - such as how to deal with "--append and --remove-files". I think most people would prefer to remove all files in bulk after the archive is done, and not on the fly as is desirable in this case.

    The GNU people are usually very friendly, you could ask why this is not a feature, how you can do it with other tools and even request this to be a part of Tar in the future;

    https://lists.gnu.org/mailman/listinfo/help-tar


  • Related Question

    linux - How to tar directory and then remove originals including the directory?
  • Nicholas

    I'm trying to tar a collection of files in a directory called 'my_directory' and remove the originals by using the command:

    tar -cvf files.tar my_directory --remove-files
    

    However it is only removing the individual files inside the directory and not the directory itself (which is what I specified in the command). What am I missing here?

    EDIT:

    Yes, I suppose the 'remove-files' option is fairly literal. Although I too found the man page unclear on that point. (In linux I tend not to really disguish much between directories and files that much, and forget sometimes that they are not the same thing). It looks like the concensus is that it doesn't remove directories.

    However, my major prompting point for asking this question stems from tar's handling of absolute paths. Because you must specify a relative path to a file/s to be compressed, you therefore must change to the parent directory to tar it properly. As I see it using any kind of follow-on 'rm' command is potentially dangerous in that situation. Thus I was hoping to simplifiy things by making tar itself do the remove.

    For example, imagine a backup script where the directory to backup (ie. tar) is included as a shell variable. If that shell variable value was badly entered, it is possible that the result could be deleted files from whatever directory you happened to be in last.


  • Related Answers
  • nik

    You are missing the part which says the --remove-files option removes files after adding them to the archive.

    You could follow the archive and file-removal operation with a command like,

    find /path/to/be/archived/ -depth -type d -empty -exec rmdir {} \;


    Update: You may be interested in reading this short Debian discussion on,
    Bug 424692: --remove-files complains that directories "changed as we read it".

  • pavium

    Since the --remove-files option only removes files, you could try

    tar -cvf files.tar my_directory && rm -R my_directory
    

    so that the directory is removed only if the tar returns an exit status of 0

  • 8088
    source={directory argument}
    

    e.g.

    source={FULL ABSOLUTE PATH}/my_directory
    

     

    parent={parent directory of argument}
    

    e.g.

    parent={ABSOLUTE PATH of 'my_directory'/
    

     

    logFile={path to a run log that captures status messages}
    

    Then you could execute something along the lines of:

    cd ${parent}
    
    tar cvf Tar_File.`date%Y%M%D_%H%M%S` ${source}
    
    if [ $? != 0 ]
    
    then
    
     echo "Backup FAILED for ${source} at `date` >> ${logFile}
    
    else
    
     echo "Backup SUCCESS for ${source} at `date` >> ${logFile}
    
     rm -rf ${source}
    
    fi
    
  • mit

    This was probably a bug.

    Also the word "file" is ambigous in this case. But because this is a command line switch I would it expect to mean also directories, because in unix/lnux everything is a file, also a directory. (The other interpretation is of course also valid, but It makes no sense to keep directories in such a case. I would consider it unexpected and confusing behavior.)

    But I have found that in gnu tar on some distributions gnu tar actually removes the directory tree. Another indication that keeping the tree was a bug. Or at least some workaround until they fixed it.

    This is what I tried out on an ubuntu 10.04 console:

    mit:/var/tmp$ mkdir tree1                                                                                               
    mit:/var/tmp$ mkdir tree1/sub1                                                                                          
    mit:/var/tmp$ > tree1/sub1/file1                                                                                        
    
    mit:/var/tmp$ ls -la                                                                                                    
    drwxrwxrwt  4 root root 4096 2011-11-14 15:40 .                                                                              
    drwxr-xr-x 16 root root 4096 2011-02-25 03:15 ..
    drwxr-xr-x  3 mit  mit  4096 2011-11-14 15:40 tree1
    
    mit:/var/tmp$ tar -czf tree1.tar.gz tree1/ --remove-files
    
    # AS YOU CAN SEE THE TREE IS GONE NOW:
    
    mit:/var/tmp$ ls -la
    drwxrwxrwt  3 root root 4096 2011-11-14 15:41 .
    drwxr-xr-x 16 root root 4096 2011-02-25 03:15 ..
    -rw-r--r--  1 mit   mit    159 2011-11-14 15:41 tree1.tar.gz                                                                   
    
    
    mit:/var/tmp$ tar --version                                                                                             
    tar (GNU tar) 1.22                                                                                                           
    Copyright © 2009 Free Software Foundation, Inc.
    
    

    If you want to see it on your machine, paste this into a console at your own risk:

    tar --version                                                                                             
    cd /var/tmp
    mkdir -p tree1/sub1                                                                                          
    > tree1/sub1/file1                                                                                        
    tar -czf tree1.tar.gz tree1/ --remove-files
    ls -la