linux tar -T - does not work on the fly
2014-07
I found some problem with linux gnu tar. wheh I use option
-T - (for file list from stdin) or
-T named_pipe_file ,
this desn't work on the fly. for example, simple interactive script:
while read x; do echo $x; done|\
tar cvf tar.tar -T -
tar starts archiving only when I press ^D for marking input EOF the same situation is when I use named pipe:
mkfifo named_pipe
tar cvf tar.tar -T named_pipe
while read x; do echo $x; done >named_pipe
It seems tar makes some buffering. But how long is it? I must repack a lot of files to TAR but have little disk space. Then I must do this on the fly. I waht use tar option --remove-files for this. But without interactivity for -T option it's impossobie. In the plan, "while" part of code should unpack file to file sequently and waiting for TAR for removing, and next file. Thanks for ideas :)
my tar version: tar (GNU tar) 1.26 (C) 2011 FSF
tar
is able to append to already existing archives, so you could do:
touch tarfile.tar
command_that_produces_file_list | xargs tar rf tarfile.tar
Unfortunately, this doesn't work with on the fly compression. Luckily, the tar
format is simple enough we can do some hacking:
command_that_produces_file_list | {
xargs -i sh -c 'tar c {} | head -c $(( (`stat --printf="%s" {}` + 511) / 512 * 512 + 512))';
dd if=/dev/zero bs=512 count=2 2>/dev/null;
} | compression_utility
tar
output consists of, for each file, a 512-byte header followed by enough 512-byte blocks to hold the file data. It then appends at least 2 512-byte blocks of zeros. What this code does is capture the output of tar and remove the extra blocks of zeros, combine the output from the multiple invocations of tar
together, and then sticks on the terminating blocks of zeros. The output is sent down the pipe to the compression utility, which runs concurrently with the tar
s.
Good news. I get answer for my bug report to [email protected] , cite:
From: Sergey Poznyakoff date:
Thu, 05 Sep 2013 08:40:40 +0300 subject: Re: [Bug-tar] gnu tar, option -T from stdin or named pipe is not interactiveHi Grzegorz,
This has been fixed in the git HEAD (starting from commit 1fe0c83d).
Regards, Sergey
Then I'm waiting when this will be fixed in linux distros :)
Read this explanation (first answer): In what order do piped commands run?
What you see is tar blocking for the completion of the input list before it starts processing. Arguably, doing the processing in parallel with the input, one-by-one could be useful, but I don't think GNU Tar supports that.
I can only guess that waiting for the whole list is done to avoid complexity in the "internal procedures" of handling command line arguments - such as how to deal with "--append and --remove-files". I think most people would prefer to remove all files in bulk after the archive is done, and not on the fly as is desirable in this case.
The GNU people are usually very friendly, you could ask why this is not a feature, how you can do it with other tools and even request this to be a part of Tar in the future;
I'm trying to tar
a collection of files in a directory called 'my_directory' and remove the originals by using the command:
tar -cvf files.tar my_directory --remove-files
However it is only removing the individual files inside the directory and not the directory itself (which is what I specified in the command). What am I missing here?
EDIT:
Yes, I suppose the 'remove-files' option is fairly literal. Although I too found the man page unclear on that point. (In linux I tend not to really disguish much between directories and files that much, and forget sometimes that they are not the same thing). It looks like the concensus is that it doesn't remove directories.
However, my major prompting point for asking this question stems from tar's handling of absolute paths. Because you must specify a relative path to a file/s to be compressed, you therefore must change to the parent directory to tar it properly. As I see it using any kind of follow-on 'rm' command is potentially dangerous in that situation. Thus I was hoping to simplifiy things by making tar itself do the remove.
For example, imagine a backup script where the directory to backup (ie. tar) is included as a shell variable. If that shell variable value was badly entered, it is possible that the result could be deleted files from whatever directory you happened to be in last.
You are missing the part which says the --remove-files
option removes files after adding them to the archive.
You could follow the archive and file-removal operation with a command like,
find /path/to/be/archived/ -depth -type d -empty -exec rmdir {} \;
Update: You may be interested in reading this short Debian discussion on,
Bug 424692: --remove-files complains that directories "changed as we read it".
Since the --remove-files
option only removes files, you could try
tar -cvf files.tar my_directory && rm -R my_directory
so that the directory is removed only if the tar
returns an exit status of 0
source={directory argument}
e.g.
source={FULL ABSOLUTE PATH}/my_directory
parent={parent directory of argument}
e.g.
parent={ABSOLUTE PATH of 'my_directory'/
logFile={path to a run log that captures status messages}
Then you could execute something along the lines of:
cd ${parent}
tar cvf Tar_File.`date%Y%M%D_%H%M%S` ${source}
if [ $? != 0 ]
then
echo "Backup FAILED for ${source} at `date` >> ${logFile}
else
echo "Backup SUCCESS for ${source} at `date` >> ${logFile}
rm -rf ${source}
fi
This was probably a bug.
Also the word "file" is ambigous in this case. But because this is a command line switch I would it expect to mean also directories, because in unix/lnux everything is a file, also a directory. (The other interpretation is of course also valid, but It makes no sense to keep directories in such a case. I would consider it unexpected and confusing behavior.)
But I have found that in gnu tar on some distributions gnu tar actually removes the directory tree. Another indication that keeping the tree was a bug. Or at least some workaround until they fixed it.
This is what I tried out on an ubuntu 10.04 console:
mit:/var/tmp$ mkdir tree1 mit:/var/tmp$ mkdir tree1/sub1 mit:/var/tmp$ > tree1/sub1/file1 mit:/var/tmp$ ls -la drwxrwxrwt 4 root root 4096 2011-11-14 15:40 . drwxr-xr-x 16 root root 4096 2011-02-25 03:15 .. drwxr-xr-x 3 mit mit 4096 2011-11-14 15:40 tree1 mit:/var/tmp$ tar -czf tree1.tar.gz tree1/ --remove-files # AS YOU CAN SEE THE TREE IS GONE NOW: mit:/var/tmp$ ls -la drwxrwxrwt 3 root root 4096 2011-11-14 15:41 . drwxr-xr-x 16 root root 4096 2011-02-25 03:15 .. -rw-r--r-- 1 mit mit 159 2011-11-14 15:41 tree1.tar.gz mit:/var/tmp$ tar --version tar (GNU tar) 1.22 Copyright © 2009 Free Software Foundation, Inc.
If you want to see it on your machine, paste this into a console at your own risk:
tar --version cd /var/tmp mkdir -p tree1/sub1 > tree1/sub1/file1 tar -czf tree1.tar.gz tree1/ --remove-files ls -la