windows 7 - How can I check the actual size used in an NTFS directory with many hardlinks?

23
2014-04
  • kbyrd

    On a Win7 NTFS volume, I'm using cwrsync which supports --link-dest correctly to create "snapshot" type backups. So I have:

    z:\backups\2010-11-28\cygdrive\c\Users\...
    z:\backups\2010-12-02\cygdrive\c\Users\...
    

    The content of 2010-12-02 is mostly hardlinks back to files in the 2010-11-28 directory, but there are a few new or changed files only in 2010-12-02. On linux, the 'du' utility will tell me the actual size taken by each incremental snapshot. On Windows, explorer and du under cygwin are both fooled by hardlinks and shows 2010-12-02 taking up a little more space than 2010-11-28.

    Is there a Windows utility that will show the correct space acutally used?

  • Answers
  • DMA57361

    Try using Sysinternals Disk Usage (otherwise know as du), specifically using the -u and -v flags will only count unique occurrences, and will show the usage of each folder as it goes along.

    As far as I know the file system doesn't show the difference between the original file and a hard link (that is really the point of a hard link) so you can't discount them on a folder-by-folder basis, but need to do this comparatively.

    To test I created a random folder with 6 files in to. Cloned the whole thing. Then created several hard and soft links inside the first folder to reference other files in the first folder, and also some in the second.

    Running du -u -v testFld results in (note the values next to the folders are in KiB):

           104  <path>\testFld\A
            54  <path>\testFld\B
           149  <path>\testFld
    
    Totals:
    Files:        12
    Directories:  2
    Size:         162,794 bytes
    Size on disk: 162,794 bytes
    

    Running du -u -v testFld\a results in:

    104  <path>\testFld\a
    ...
    

    Running du -u -v testFld\b results in:

    74   <path>\testFld\b
    ...
    

    Notice the mismatch?
    The symlinks in A that refer to files in B are only counted against A during the "full" run, and B only returns 54 (even though the files were originally in B and hard-linked from A). When you measure B seperately (or, if you don't use the -u unique flag) it will count its "full" measure of 74.

  • matt wilkie

    TreeSize Professional (~$55, 30 day trial) claims to distingish NTFS hardlink disk space. A quick trial seems to bear this out.

    Hardlink support is not turned on out of the box: go to Tools > Options > Scan, re-scan, then use Ctrl-1 and Ctrl-2 to switch between Size and Allocated space. Allocated is actual space used, while Size is the statistic normally reported by other programs.

    There is a performance penalty for turning on hardlink support (and symlinks and mounts too if you want that also). The colour palette is garish for my taste, but that seems to be par for the course in this genre. Also be careful when clicking around in the box chart area -- it's easy to accidentally move a folder with a mistaken drag-n-drop when you only meant to expand it.

  • harrymc

    I think some facts need to be set right here.

    Windows cannot "detect" hardlinks, since every file is actually a hardlink to a bunch of bytes on the disk.

    The du tool detects duplicates, but that is false too, since if folder A contains files and B only contains hardlinks to the files in A, then du of A and du of B will return the same answer - the size of the files coming originally from A, but these files are now also in B.

    This is actually correct, since for example if you deleted A then its files will not be deleted on the disk, because they are still referenced by B. With hard-links, which file is the source and which one is the hard-link is quite arbitrary and meaningless.

    Products such as du will list a directory while discounting duplicates. This will only work if all files and hard-links are contained in one directory. Many folder-list products do that.

    Conclusion: With hard-links, the question of "the actual size used in an NTFS directory" is meaningless.


  • Related Question

    windows xp - How can one undo many hard links?
  • Questioner

    I foolishly used Dupemerge to change all my duplicate files into hard links. Now Windows XP is not running right, eg, explorer won't start.

    Is there a utility which would traverse the filesystem looking for hard links, copy the file, delete the original link, and rename the copy, keeping the original attributes and name?


  • Related Answers
  • Bender

    I doubt that there's a utility for undoing what was done. You can search for duplicates again, check their link counts and attributes (or maybe Dupemerge can help identify hard links to the same files) and do the copying by hand. This may at least help you find out whether hard links are the cause of problems.

  • Redandwhite
    • Since you've converted them into hard links, you might be in luck and they might still show up as duplicates using something like DoubleKiller.

    • Either way, I doubt there's a utility for this exact task.

    If all else fails I recommend a re-install...

  • Tom Wijsman

    to fix the operating system use the system file checker:

    insert the windows xp installation CD

    press CTRL + ALT + DEL to bring up the task manager, go to File > Run (New Task) and type sfc /scannow and click OK.

    note: this will only restore the system files, but it will get you going again. as for other software affected you'll have to re-install or repair install where necessary.

  • Mokubai

    SameFiles Assistant 3.1 might work:

    Same Files Assistant is the hard links managing utility.

    Specifically one feature it has:

    • You can roll back hard links to the regular files at any time.
  • harrymc

    Try Hard Link Magic, it might help.

    Also Microsoft's Junction has the ability to recursively traverse directories and list/delete junction-points.

    Just be careful to create a system restore point before you do these manipulations.

  • Peter John Acklam

    I have written a Perl script that identifies all regular files that are hard links to the same data. The script works fine on UNIX and Cygwin. I haven’t tested it with Strawberry Perl or any other Windows port of Perl, but I thought I’d share it anyhow. On Windows (Cygwin) I would open a terminal and do ./list-dup-hard-links /cygdrive/c/.

    #!/usr/bin/perl
    #
    # NAME
    #
    #   list-dup-hard-links - list regular file names pointing to the same inode
    #
    # SYNOPSIS
    #
    #   list-dup-hard-links DIRECTORY
    #
    # DESCRIPTION
    #
    #   For each inode that is referred to by more than one regular file, print
    #   the inode number and the list of corresponding files.
    #
    # AUTHOR
    #
    #   Peter John Acklam <[email protected]>
    
    use strict;                             # restrict unsafe constructs
    use warnings;                           # control optional warnings
    use File::Find;                         # traverse a file tree
    
    if (@ARGV != 1) {
        die "Usage: $0 DIRECTORY\n";
    }
    
    my $start_dir = shift;                  # starting directory
    my $start_dev = (stat $start_dir)[0];   # device number of where we start
    my %inum2files;                         # map each inode number to file(s)
    
    sub wanted {
        return if -l;                       # skip symlinks
        my @fileinfo = stat();              # get file info
    
        if (-d _) {                         # if a directory
            my $this_dev = $fileinfo[0];    # get device number
            if ($this_dev != $start_dev) {  # if we crossed a device boundary
                $File::Find::prune = 1;     #   mark directory for pruning
                return;                     #   and return
            }
        }
    
        return unless -f _;                 # continue only if a regular file
    
        my $inum = $fileinfo[1];            # get inode number
        push @{ $inum2files{$inum} },       # append this file to the list of
          $File::Find::name;                #   all files with this inode number
    }
    
    find(\&wanted, $start_dir);             # traverse the file tree
    
    while (my ($inum, $files) = each %inum2files) {
        next if @$files < 2;                # skip non-duplicates
    
        print "\nInode number: $inum\n\n"   # print header
          or die "$0: print failed: $!\n";
    
        for my $file (@$files) {
            print "    $file\n"             # print file name
              or die "$0: print failed: $!\n";
        }
    }
    
  • Carlos Almiro

    In Unix, hard links to one file links same "inode number". "stat" function returns file properties like size, mode, alter date, modification date, inode number, ..., but return inode number "0" for any file in Windows. Use perl Win32::IdentifyFile (CPAN) to get a file disk "localization". Hard links "links" to same disk "localization".