Delete huge number of files on Windows

07
2014-07
  • Jackey Cheung

    I have a directory that contains millions of sub-directory and trillions of files. And now I have to clear it. Saying trillion, I'm not talking about file size, but the number of files.

    I've tried deleting it with del/s, and using windows explorer. Neither can complete the task. I've deleting some of the sub-directories one by one, and that toke me days. The problem i met was that every time, no matter using del or explorer, I can see in the task manager that the explorer instance consume sky high memory, and gradually pushes my system to crash.

    There are still some hundred million files to be deleted. Is there any possibility to achieve with one (or just a few) commands / actions?


    [EDITED]

    I've tried doing it with Cygwin rm -fr, and yielded the same result. Summarized as:

    1. No matter using Windows Explorer, DEL from command prompt, or Cygwin rm command, the System memory gradually drops to zero, and the box will eventually crash.

    2. If at any point, before the system fails, the process is closed (by CTRL+C or what else), the box will continue to work as normal. However, all used memory will NOT be freed. Say, I've stop the process while system memory reaches 91%, Task Manager tells: 4G RAM in total, Cache is 329M, and Available 335MB. Then the memory usage will stay around this level until I reboot the machine. If I stop the explorer instance in Task Manager, the screen will go blank with HDD light all time on, and never came back. Normally, when I stop the explorer instance in Task Manager, I can re-invoke it by pressing Win+E, or it were restarted automatically.

    Well, really nice memory management!


    [EDIT AGAIN] It seems that some of the used memory did got freed after a long while, but not all. Some of the Cached & Available memory did come back in Task Manager. I haven't waited any longer, not sure what will happen then.

  • Answers
  • soandos

    To delete all folders will take a long time, and there is not a whole lot you can do about it. What you can do is save your data, and format your drive. It is not optimal, but it will work (and quickly).

    Another option is perhaps to use some linux distro on a live CD that can read from an NTFS partition. I know from personal experience that rm -rf folderName can run for at least 2 days without crashing a system with 2GB of RAM. It will take a while, but at least it will finish.

  • Bob

    Erm.. I don't want to know how you created so many.

    What's happening is Explorer is trying to enumerate every single file, and store the information in memory, before it starts deleting. And there's obviously way too many.

    Have you tried the command rmdir /s? As long as it actually deletes the files as they are found rather than waiting on every single one to be enumerated, it may work.

    How many levels of subdirectories are there? If there's only one, or some other low number, then a quick batch file that manually recurses through might work.

    Any method will take a while, though.

  • Tom Wijsman

    Shift+Delete skips the Recycle Bin, and might significantly speed up things.

    If that doesn't work (extreme cases), try Fast Folder Eraser and / or Mass Directory Eraser

  • Ben Voigt

    It's probably your antivirus/antimalware consuming all the memory and then crashing the system.

    Windows itself doesn't have a problem deleting huge numbers of files, although it certainly is slower than a similar operation on most non-Microsoft filesystems.

  • ITGabs

    I had similar problems time ago with just 10 million of files but in a server 2003, to delete the files I used a ftp server/client, and left the client deleting the files and folders. It's a slow solution but it works perfect.

    Probably you will have a second problem with the MFT in NTFS that have no solution, the MFT is an array that in win 2003 (I am not sure if Microsoft have a solution after win 2003) is storing all the files in a incremental way so with trillion of files the size will be crazy, in my case the MFT had 17 million of records and the size of the MFT was around 19GB with just 45000 files, I tested in other systems and looks like for 1 million of records the MFT will be around 1 GB.

    You can check the status of the MFT with this command:

    defrag c: /a /v

    C: unit letter /a analyze /v verbose

    Another tricky solution, since there is no tool that can shrink the MFT, the tools just fill with 0 the name of the files and properties but nothing more, but you can use VMware converter or another kind of P2V and create a virtual machine based on your server, in that way you will fix all the problems related to the MFT, I never tested the conversion from V2P, now I am working only in virtual environments, but I saw many info about it on internet.

    That win 2003 it's working perfectly now, the size of the MFT is 40 MB and everything is ok, if you want I can tell you more about backups, defrags or others task related to millions of tiny files

  • Harry Johnston

    One possible cause of an issue like this is thin provisioning, typically found in SAN environments. Some solid-states drives might exhibit the same issue. If this is the case, this configuration change might solve your problem:

    fsutil behavior set DisableDeleteNotify 1
    

    Note that this change may impact performance on solid state drives, and may prevent automatic and/or manual rethinning of SAN drives.

  • Geoff

    Per this answer on StackOverflow use a combination of del and rmdir:

    del /f/s/q foldername > nul
    rmdir /s/q foldername
    
  • Isaac Rabinovitch

    Since deleting the files all at once uses too much memory, you need a way to delete them one at a time, but with the process automated. This sort of thing is a lot easier to do in a Unix-style shell, so let's use Cygwin. The following command generates a list of ordinary files, transforms that list into a sequence of rm commands, then feeds the resulting script to a shell.

     find dir \! -type d | sed 's/^/rm /' | sh
    

    The script is being executed even as it is being generated, and there are no loops, so the shell does not (hopefully) have to create any big temp files. It will certainly take a while, since the script is millions of lines long. You might have to tweak the rm command (perhaps I should have used -f? but you understand your files better than me) to get it to work.

    Now you have nothing left but directories. Here's where things get dicy. Maybe you've deleted enough files so that you can do rm -rf without running out of memory (and it will probably be faster than another script). If not, we can adapt this Stackoverflow answer:

     find dir | perl -lne 'print tr:/::, " $_"' | sort -n | cut -d' ' -f2 | sed 's/^/rmdir /' | sh
    

    Again, tweaking may be necessary, this time with sort, to avoid creating huge temp files.

  • Synetech

    Technical Explanation

    The reason that most methods are causing problems is that Windows tries to enumerate the files and folders. This isn’t much of a problem with a few hundred—or even thousand—files/folders a few levels deep, but when you have trillions of files in millions of folders going dozens of levels deep, then that will definitely bog the system down.

    Let’s you have “only” 100,000,000 files, and Windows uses a simple structure like this to store each file along with its path (that way you avoid storing each directory separately, thus saving a some overhead):

    struct FILELIST {                   // Total size is 264 to 528 bytes:
      TCHAR         name[MAX_PATH];     // MAX_PATH=260; TCHAR=1 or 2 bytes
      FILELIST*     nextfile;           // Pointers are 4 bytes for 32-bit and 8 for 64-bit
    }
    

    Depending on whether it uses 8-bit characters or Unicode characters (it uses Unicode) and whether your system is 32-bit or 64-bit, then it will need between 25GB and 49GB of memory to store the list (and this is a a very simplified structure).

    The reason why Windows tries to enumerate the files and folders before deleting them varies depending on the method you are using to delete them, but both Explorer and the command-interpreter do it (you can see a delay when you initiate the command). You can also see the disk activity (HDD LED) flash as it reads the directory tree from the drive.

    Solution

    Your best bet to deal with this sort of situation is to use a delete tool that deletes the files and folders individually, one at a time. I don’t know if there are any ready-made tools to do it, but it should be possible to accomplish with a simple batch-file.

    @echo off
    if not [%1]==[] cd /d %1
    del /q *
    for /d %%i in (*) do call %0 "%%i"
    

    What this does is to check if an argument was passed. If so, then it changes to the directory specified (you can run it without an argument to start in the current directory or specify a directory—even on a different drive to have it start there).

    Next, it deletes all files in the current directory. In this mode, it should not enumerate anything and simply delete the files without sucking up much, if any, memory.

    Then it enumerates the folders in the current directory and calls itself, passing each folder to it(self) to recurse downward.

    Analysis

    The reason that this should work is because it does not enumerate every single file and folder in the entire tree. It does not enumerate any files at all, and only enumerates the folders in the current directory (plus the remaining ones in the parent directories). Assuming there are only a few hundred sub-directories in any given folder, then this should not be too bad, and certainly requires much less memory than other methods that enumerate the entire tree.

    You may wonder about using the /r switch instead of using (manual) recursion. That would not work because while the /r switch does recursion, it pre-enumerates the entire directory tree which is exactly what we want to avoid; we want to delete as we go without keeping track.

    Comparison

    Lets compare this method to the full-enumeration method(s).

    You had said that you had “millions of directories”; let’s say 100 million. If the tree is approximately balanced, and assuming an average of about 100 sub-directories per folder, then the deepest nested directory would be about four levels down—actually, there would be 101,010,100 sub-folders in the whole tree. (Amusing how 100M can break down to just 100 and 4.)

    Since we are not enumerating files, we only need to keep track of at most 100 directory names per level, for a maximum of 4 × 100 = 400 directories at any given time.

    Therefore the memory requirement should be ~206.25KB, well within the limits of any modern (or otherwise) system.

    Test

    Unfortunately(?) I don’t have a system with trillions of files in millions of folders, so I am not able to test it (I believe at last count, I had about ~800K files), so someone else will have to try it.

    Caveat

    Of course memory isn’t the only limitation. The drive will be a big bottleneck too because for every file and folder you delete, the system has to mark it as free. Thankfully, many of these disk operations will be bundled together (cached) and written out in chunks instead of individually (at least for hard-drives, not for removable media), but it will still cause quite a bit of thrashing as the system reads and writes the data.

  • Kelly

    A problem you might be running into is that the directory does not get compacted when you delete a file/folder, so if you have a folder with 1 million files in it and delete the first 500k of them. There are a ton of blocks at the beginning of your directory that are for all intents blank.

    BUT, explorer and a command prompt still has to look through those blocks just in case there is a file there. Something that might help is to "move" a folder from someplace down the tree to a new folder off the base of the drive, then delete that new folder. Moving the folder will only move the pointer to the folder so it should go quickly and not actually move all the files under it to new space on the drive.

    Another thing you may try is to use a 3rd party tool like "PerfectDisk" to Compact folders after deleting a bunch of files.


  • Related Question

    command line - Recursively delete empty directories in Windows
  • mohlsen

    I have a directory on my Windows 7 machine that has hundreds if not thousands of sub-directories. Some of them have files, some do not. I want to delete all the empty directories.

    Looking at the del and rmdir DOS command, it does not look like you can recursively do this without deleting all the files. Is there a way to do this from the command line? Or is there a tool that would do it for me?


  • Related Answers
  • 8088

    You can use this utility: Remove Empty Directories

    Alternatively you can use this one-liner batch file:

    for /f "delims=" %%d in ('dir /s /b /ad ^| sort /r') do rd "%%d"
    

    One-liner taken from DownloadSquad, an excellent site to add to your RSS feeds. :)

  • 8088

    If you get the get the error:

    "%%d was unexpected at this time.
    

    it is likely you are running directly from the command line. In that case, change the double %% to a single %:

    for /f "delims=" %d in ('dir /s /b /ad ^| sort /r') do rd "%d"
    
  • Rob Kam

    The free utility EmptyFolderNuker does this fine, from a base folder of your choice. It also removes those directories only containing empty sub-directories.

  • 8088

    Since Cygwin comes with GNU find, you can do this:

    find . -type d -empty -delete
    

    Or to avoid the noise when a folder no longer exists:

    find . -type d -empty -execdir rmdir {} +
    
  • outsideblasts

    The excellent Glary Utilities has this and a bunch of other great features.

  • Nighthawk

    If you have Cygwin installed, you could do this:

    find -type d -exec rmdir {} \;