unix - How to get the actual directory size (out of du)?
2013-08
How do I get the actual directory size, using UNIX/Linux standard tools?
Alternative question: How do I get du to show me the actual directory size?
Since people seem to have different definitions of the term "size": My definition of "directory size" is the sum of all regular files within that directory.
I do NOT care about the size of the directory inode or whatever (blocks * block size) the files take up on the respective file system. A directory with 3 files, 1 byte each, has a directory size of 3 bytes (by my definition).
Calculating the directory size seems to be very unreliable.
For example, an empty directory is usually 4096 bytes in size, according to du -b. This would be the block size (tune2fs -l reports a block size of 4096 for the file system I used in this case) - I expect zero, especially because the -b option includes the --apparent-size option, which is supposed to "print [the] apparent sizes, rather than disk usage".
A directory with a 1 byte file (echo -n "a" >foo) is 4097 bytes in size, so it looks like du adds the size of the directory inode itself to the total.
More importantly, I've seen extreme differences with large directories.
There is a directory which a total size of 314086500373 B (about 293 GiB), I got this number using a cheap shell script, which basically finds (find -type f) all the files and gets their size based on the output of ls.
Since parsing the output of ls is error-prone, I also wrote a simple C++ program which calculates the directory size without the help of any other tools and it confirmed the directory size of 314086500373 B (it really just contains a lot of regular files in a few subdirectories, no links or other special files).
Here is the problem:
I moved this directory to another file system on my Linux box and I simply want to compare the directory size before and after (and during the process), which turned out to be difficult.
In the old location, du -hs reported a size of "181G". That's just wrong. du -b (no -s) gives "314086500847" (B!?) for this directory, which is already (+474 B) more than the actual total directory size in bytes.
In the new location, du -hs now reports a size of "586G". The directory is still the same, I have verified the MD5 sums of all files in this directory (before and after) and they match. (The files are supposed to use more disk space in the new location, but that's not what I want to know from du.) du -bs shows "314086500847" (at least like before). Both the shell script and my program calculate the old/correct directory size of 314086500373 B (same as in the old location).
An error of -112 G / +293 G is unacceptable. Simply calculating the total (logical) size of a directory should be the easiest thing in the world.
So what (tool/option) has to be used to get the actual directory size (maybe I'm just using du wrong)?
Preferably using standard tools, but if there is a "better version of du" that'd work too.
Update:
Several answers have already been posted. This works on Linux:
find $DIR -type f -print0 | du -scb --files0-from=- | tail -n 1
This also works on FreeBSD ("du: illegal option -- b"):
ls -lnR $DIR | grep -v '^d' | awk '{total += $5} END {print total, "Total"}'
It's a bit to type but it calculates the correct directory size.
My second question (a human-readable result which is NOT off by over 100 GB) has been answered with a nice awk script.
Here is a script displaying a human readable directory size using Unix standard tools (POSIX).
#!/bin/sh
find ${1:-.} -type f -exec ls -lnq {} \+ | awk '
function pp() {
u="+Ki+Mi+Gi+Ti";
split(u,unit,"+");
v=sum;
r=0;
for(i=1;i<5;i++) {
if(v<1024) break;
r=v%1024;
v/=1024;
}
printf("%.3f %sB\n",v+r/1024.,unit[i]);
}
{sum+=$5}
END{pp()}'
eg:
$ ds ~
72.891 GiB
Assuming you have du
from GNU coreutils, this command should calculate the total apparent size of arbitrary number of regular files inside a directory without any arbitrary limits on the number of files:
find . -type f -print0 | du -scb --files0-from=- | tail -n 1
Add the -l
option to du
if there are some hardlinked files inside, and you want to count each hardlink separately (by default du
counts multiple hardlinks only once).
The most important difference with plain du -sb
is that recursive du
also counts sizes of directories, which are reported differently by different filesystems; to avoid this, the find
command is used to pass only regular files to du
. Another difference is that symlinks are ignored (if they should be counted, the find
command should be adjusted).
This command will also consume more memory than plain du -sb
, because using the --files0-from=FILE
makes du
store device and inode numbers of all processed files, as opposed to the default behavior of remembering only files with more than one hard link. (This is not an issue if the -l
option is used to count hardlinks multiple times, because the only reason to store device and inode numbers is to skip hardlinked files which had been already processed.)
If you want to get a human-readable representation of the total size, just add the -h
option (this works because du
is invoked only once and calculates the total size itself, unlike some other suggested answers):
find . -type f -print0 | du -scbh --files0-from=- | tail -n 1
or (if you are worried that some effects of -b
are then overridden by -h
)
find . -type f -print0 | du -sc --apparent-size -h --files0-from=- | tail -n 1
If all you want is the size of the files, excluding the space the directories take up, you could do something like
find . -type f -print0 | xargs -0 du -scb | tail -n 1
@SergeyVlasov pointed out that this will fail if you have more files than argmax
. To avoid that you could use something like:
find . -type f -exec du -sb '{}' \; | gawk '{k+=$1}END{print k}'
Just an alternative, using ls
:
ls -nR | grep -v '^d' | awk '{total += $5} END {print total, "Total"}'
ls -nR
: -n
like -l
, but list numeric UIDs and GIDs and -R
list subdirectories recursively.
grep -v:
Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX .). '^ d'
will exclude the directories.
Ls command: http://linux.about.com/od/commands/l/blcmdl1_ls.htm
Man Grep: http://linux.die.net/man/1/grep
EDIT:
Edited as the suggestion @ Sergey Vlasov.
Some versions of du
support the argument --apparent-size
to show apparent size instead of disk usage. So your command would be:
du -hs --apparent-size
From the man pages for du included with Ubuntu 12.04 LTS:
--apparent-size
print apparent sizes, rather than disk usage; although the
apparent size is usually smaller, it may be larger due to holes
in (`sparse') files, internal fragmentation, indirect blocks,
and the like
I am writing a geektool 3 script to show the size of a particular VMware Fusion virtual machine. Such .vmwarevm
"file" is really a packaged directory.
Get Info in Finder says the file is "52.91 GB". I run the following du
command to get the file size:
> du -hs ~/Documents/Virtual\ Machines.localized/MY-PRECIOUS-7.vmwarevm | awk '{print $1}'
This du -hs
command returns the file size as "49G". What accounts for the difference from what Finder reports?
Alternatively, I have tried replacing the -s
option with the -d
option like so:
du -hd ~/Documents/Virtual\ Machines.localized/MY-PRECIOUS-7.vmwarevm | awk '{print $1}'
This du -hd
command returns the file size as "59G". What accounts for the difference between Finder, du -hd
, and du -hs
?
Also, this du -hd
command produces no output in geektool 3. What gives?
Not an Mac OS X user but I read somewhere that Finder uses base-10 nowadays.
Could the difference be that du still uses base-2?
49.0 GiB * (1024^3 bytes / GB) = 52,613,349,376 bytes = 52.6 GB
(the small difference is because du
is rounding to the nearest GB)
Also note that "du" returns the actual disk space used, whereas other tools will return the allocated space. Where you have sparse files these 2 values can be quite different - I have a file that has 2 TB allocated to it but it is only occupying about 200 MB of disk space.
Nifle is correct. du
returns base-2, Finder returns base-10, and 49GiB ≈ 52.91GB
The -d argument requires you to specify a depth. From experimenting just now, it appears if you give something besides a number for depth, it eats the argument you specified and behaves as if you did -d 0
. The 59G you're getting is the size of the current directory, not the size of the vmwarevm file.
Here is the experiment I ran:
[~]$ ls -d Code
Code/
[~]$ du -h -s Code
8.3M Code
[~]$ du -h -d 0 Code
8.3M Code
[~]$ du -h -d Code
38G . <-- this is the size of ~/
[~]$ du -h -d 0
38G .
You might want to read the Mac OS X du(1) manpage for more information.