unix - How to get the actual directory size (out of du)?

08
2013-08
  • basic6

    How do I get the actual directory size, using UNIX/Linux standard tools?

    Alternative question: How do I get du to show me the actual directory size?

    Since people seem to have different definitions of the term "size": My definition of "directory size" is the sum of all regular files within that directory.

    I do NOT care about the size of the directory inode or whatever (blocks * block size) the files take up on the respective file system. A directory with 3 files, 1 byte each, has a directory size of 3 bytes (by my definition).

    Calculating the directory size seems to be very unreliable.
    For example, an empty directory is usually 4096 bytes in size, according to du -b. This would be the block size (tune2fs -l reports a block size of 4096 for the file system I used in this case) - I expect zero, especially because the -b option includes the --apparent-size option, which is supposed to "print [the] apparent sizes, rather than disk usage".
    A directory with a 1 byte file (echo -n "a" >foo) is 4097 bytes in size, so it looks like du adds the size of the directory inode itself to the total.

    More importantly, I've seen extreme differences with large directories.
    There is a directory which a total size of 314086500373 B (about 293 GiB), I got this number using a cheap shell script, which basically finds (find -type f) all the files and gets their size based on the output of ls.
    Since parsing the output of ls is error-prone, I also wrote a simple C++ program which calculates the directory size without the help of any other tools and it confirmed the directory size of 314086500373 B (it really just contains a lot of regular files in a few subdirectories, no links or other special files).

    Here is the problem:
    I moved this directory to another file system on my Linux box and I simply want to compare the directory size before and after (and during the process), which turned out to be difficult.
    In the old location, du -hs reported a size of "181G". That's just wrong. du -b (no -s) gives "314086500847" (B!?) for this directory, which is already (+474 B) more than the actual total directory size in bytes.
    In the new location, du -hs now reports a size of "586G". The directory is still the same, I have verified the MD5 sums of all files in this directory (before and after) and they match. (The files are supposed to use more disk space in the new location, but that's not what I want to know from du.) du -bs shows "314086500847" (at least like before). Both the shell script and my program calculate the old/correct directory size of 314086500373 B (same as in the old location).
    An error of -112 G / +293 G is unacceptable. Simply calculating the total (logical) size of a directory should be the easiest thing in the world.

    So what (tool/option) has to be used to get the actual directory size (maybe I'm just using du wrong)?
    Preferably using standard tools, but if there is a "better version of du" that'd work too.


    Update:
    Several answers have already been posted. This works on Linux:

    find $DIR -type f -print0 | du -scb --files0-from=- | tail -n 1
    

    This also works on FreeBSD ("du: illegal option -- b"):

    ls -lnR $DIR | grep -v '^d' | awk '{total += $5} END {print total, "Total"}'
    

    It's a bit to type but it calculates the correct directory size.

    My second question (a human-readable result which is NOT off by over 100 GB) has been answered with a nice awk script.

  • Answers
  • jlliagre

    Here is a script displaying a human readable directory size using Unix standard tools (POSIX).

    #!/bin/sh
    find ${1:-.} -type f -exec ls -lnq {} \+ | awk '
    function pp() {
      u="+Ki+Mi+Gi+Ti";
      split(u,unit,"+");
      v=sum;
      r=0;
      for(i=1;i<5;i++) {
        if(v<1024) break;
        r=v%1024;
        v/=1024;
      }
      printf("%.3f %sB\n",v+r/1024.,unit[i]);
    }
    {sum+=$5}
    END{pp()}'
    

    eg:

    $ ds ~        
    72.891 GiB
    
  • Sergey Vlasov

    Assuming you have du from GNU coreutils, this command should calculate the total apparent size of arbitrary number of regular files inside a directory without any arbitrary limits on the number of files:

    find . -type f -print0 | du -scb --files0-from=- | tail -n 1
    

    Add the -l option to du if there are some hardlinked files inside, and you want to count each hardlink separately (by default du counts multiple hardlinks only once).

    The most important difference with plain du -sb is that recursive du also counts sizes of directories, which are reported differently by different filesystems; to avoid this, the find command is used to pass only regular files to du. Another difference is that symlinks are ignored (if they should be counted, the find command should be adjusted).

    This command will also consume more memory than plain du -sb, because using the --files0-from=FILE makes du store device and inode numbers of all processed files, as opposed to the default behavior of remembering only files with more than one hard link. (This is not an issue if the -l option is used to count hardlinks multiple times, because the only reason to store device and inode numbers is to skip hardlinked files which had been already processed.)

    If you want to get a human-readable representation of the total size, just add the -h option (this works because du is invoked only once and calculates the total size itself, unlike some other suggested answers):

    find . -type f -print0 | du -scbh --files0-from=- | tail -n 1
    

    or (if you are worried that some effects of -b are then overridden by -h)

    find . -type f -print0 | du -sc --apparent-size -h --files0-from=- | tail -n 1
    
  • terdon

    If all you want is the size of the files, excluding the space the directories take up, you could do something like

    find . -type f -print0 | xargs -0 du -scb | tail -n 1
    

    @SergeyVlasov pointed out that this will fail if you have more files than argmax. To avoid that you could use something like:

    find . -type f -exec du -sb '{}' \; | gawk '{k+=$1}END{print k}'
    
  • Tiago CA

    Just an alternative, using ls:

    ls -nR | grep -v '^d' | awk '{total += $5} END {print total, "Total"}'
    

    ls -nR: -n like -l, but list numeric UIDs and GIDs and -R list subdirectories recursively.

    grep -v: Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX .). '^ d' will exclude the directories.

    Ls command: http://linux.about.com/od/commands/l/blcmdl1_ls.htm

    Man Grep: http://linux.die.net/man/1/grep

    EDIT:

    Edited as the suggestion @ Sergey Vlasov.

  • Brian

    Some versions of du support the argument --apparent-size to show apparent size instead of disk usage. So your command would be:

    du -hs --apparent-size
    

    From the man pages for du included with Ubuntu 12.04 LTS:

    --apparent-size
          print apparent sizes,  rather  than  disk  usage;  although  the
          apparent  size is usually smaller, it may be larger due to holes
          in (`sparse') files, internal  fragmentation,  indirect  blocks,
          and the like
    

  • Related Question

    osx - Why do Finder and du report different file size?
  • flipdoubt

    I am writing a geektool 3 script to show the size of a particular VMware Fusion virtual machine. Such .vmwarevm "file" is really a packaged directory.

    Get Info in Finder says the file is "52.91 GB". I run the following du command to get the file size:

    > du -hs ~/Documents/Virtual\ Machines.localized/MY-PRECIOUS-7.vmwarevm | awk '{print $1}'
    

    This du -hs command returns the file size as "49G". What accounts for the difference from what Finder reports?

    Alternatively, I have tried replacing the -s option with the -d option like so:

    du -hd ~/Documents/Virtual\ Machines.localized/MY-PRECIOUS-7.vmwarevm | awk '{print $1}'
    

    This du -hd command returns the file size as "59G". What accounts for the difference between Finder, du -hd, and du -hs?

    Also, this du -hd command produces no output in geektool 3. What gives?


  • Related Answers
  • Stephen Jennings

    Not an Mac OS X user but I read somewhere that Finder uses base-10 nowadays.

    Could the difference be that du still uses base-2?

    49.0 GiB * (1024^3 bytes / GB) = 52,613,349,376 bytes = 52.6 GB
    

    (the small difference is because du is rounding to the nearest GB)

  • Cry Havok

    Also note that "du" returns the actual disk space used, whereas other tools will return the allocated space. Where you have sparse files these 2 values can be quite different - I have a file that has 2 TB allocated to it but it is only occupying about 200 MB of disk space.

  • Stephen Jennings

    Nifle is correct. du returns base-2, Finder returns base-10, and 49GiB ≈ 52.91GB

    The -d argument requires you to specify a depth. From experimenting just now, it appears if you give something besides a number for depth, it eats the argument you specified and behaves as if you did -d 0. The 59G you're getting is the size of the current directory, not the size of the vmwarevm file.

    Here is the experiment I ran:

    [~]$ ls -d Code
    Code/
    
    [~]$ du -h -s Code
    8.3M    Code
    
    [~]$ du -h -d 0 Code
    8.3M    Code
    
    [~]$ du -h -d Code
     38G    .           <-- this is the size of ~/
    
    [~]$ du -h -d 0
     38G    .
    

    You might want to read the Mac OS X du(1) manpage for more information.