linux - What causes files to use up more more disk space than the file size suggests?

06
2014-04
  • pythonic metaphor

    du was showing some drives with much less space than I was expecting, and ls -alh also showed the sum at the top to be a factor of three more than the sum of the individual files. Following this answer, I checked with ls -s, and sure enough, most of the files are using three times as much disk space as their size. What causes this and can I do anything to get the disk usage down?

    Edit

    I'm seeing output like this from ls -alhs:

     50K -rw-------   1 xxx xxx 9.0K Jan 29 20:34 20120103.gz
    242K -rw-------   1 xxx xxx  67K Jan 29 20:53 20121130.gz
    

    so the problem isn't that my file sizes are much less than 4k.

  • Answers
  • cybernard

    I don't know what file system you are using or cluster size but here is some generic information that should help.

    The file system allocates data in groups sometimes called clusters(by some file systems). The cluster size is variable, but in many cases a power of 2 at least 512 bytes in size. The 512 bytes represents the physical sector size of all but the newest hard drives which have 4096 byte sectors.

    Each file uses at least 1 cluster, and in most cases the last cluster is not fully used. The remaining space on each file is remains not allocatable. Using FAT,FAT32,NTFS it is not possible to go higher than 64kb each cluster, but the same is not true for linux.

    ls -alhs
    

    how big is the file at the top of list the . and the ..?

    So if you have a lot of files wasting tiny amounts of space it all adds up to a large amount of wasted space.

    You would have to look into the exact details of your file system to find this data out. Changing file systems can have a major impact on the overhead. I tried BTRFS and it wasted a tons of space. I did a fresh install and ran updates and it was like 2x or more higher than other file systems.

    Ext4 also does poorly with a large number of small files, a perfect example of this is 1 copy of the kernel source code has 10,000's of thousands of small files.

    It is entirely possible that your file system is responsible for the wasted space and the only way to change that is changing the file system.

    In addition, some file systems support snapshots which allows for back up copies of the same file to be stored in the file system. The distro controls how the feature is configured and whether it is on by default. Every file you change or delete could be in a snapshot and not actually deleted. There is a command to remove the old snapshots, but I don't recall what the command is.


  • Related Question

    Why does Windows occupy more disk space, compared to Linux?
  • hmp

    Why does a Windows install take so much place compared to most Linux distributions, despite being capable of much less? For example, a standard Ubuntu installation takes about 4 GB and can actually be sufficient for everyday work, while Windows 7 requires 15 GB of disk space from the start and doesn't offer nearly as much functionality without external programs.

    So what is it - drivers? Configuration GUIs? DRM? Just poor space management?

    EDIT: I don't want to imply that any of the systems is better. It's just my general impression that Linux distributions are able to fit much more in smaller amount of disk space.


  • Related Answers
  • jtimberman

    Windows has lots of legacy code for backwards compatibility with heaps of third-party vendor software and platforms. It also includes full third-party drivers for heaps of software. Windows software in general has a history and reputation for being bloated, which is largely due to compatibility reasons. Windows also has the capability to play a variety of games across many DirectX versions, and a variety of proprietary multimedia formats. Compatibility and universal usage for any task are Microsoft's goals so they can maintain their position in the desktop market.

    Linux drivers are often more universal, using a common driver API across various hardware models. This is good and bad. For example, some hardware doesn't work at all, some works perfectly, and some has missing features. Software on Linux often follows the Unix philosophy - each component or tool should do one thing and do it very well, and software developers aren't afraid to break backwards compatibility to remove cruft and bad code.

    Both operating systems have their strength and weaknesses. These days where 500 GB hard drives are cheap, the disk size of the installation should be the least of your concerns. A bigger concern is how much of the system's resources are consumed by running programs.

    Either Windows or Linux is inefficient about resource usage depending on what you're doing. They have different design goals, different target markets, and different philosophies driving their development.

  • William Hilsum

    It is a very difficult thing to say...

    There isn't really one answer fits all, for Ubuntu, it is mainly because it installs a subset of tools plus every day ones. Anything you want extra is downloaded when you need it (Such as frameworks for other programs)...

    Windows Vista and 7 on the other hand copy the whole contents of the DVD to the drive and any Windows components you want to install at a later date do not require the disk to be put in.

    Again, this is a very awkward question... I am not exactly sure what to say! It can be also said that you can get string from two different companies with different widths, yet they can both tie knots...

    If you are just curious, I would download Vlite so you can take out components and just have a play to see how small you can make Windows.

  • detj

    Linux and other Unix operating systems are better designed architectural wise and disk size is an issue. Windows developers on the other hand concentrate more on the plug n play nature of Windows not to forget backward compatibility and hence so many legacy code, device drivers which gobble up so much real estate. And as the system gets older and more programs/features are installed/removed it's disk usage gets bigger & bigger.

    One primary reason for such huge space allocation specially in Windows Vista and 7 editions is the C:\Windows\winsxs folder. It takes up most of the space in the windows folder. Read more about winsxs here.

    If you want to reduce the size of winsxs, there is a tool called winsxslite. But use it at your own risk.

    Space requirement was never such a serious issue with the Windows makers, given the ever expanding hard drives. It only recently became a concern because of solid state drives and people buying netbooks.

  • geek

    Nobody knows for sure the answer to your question, except the guys at Microsoft. Because Windows is a closed-source product.

    You'd rather go to support.microsoft.com and ask there, I bet they enjoy receiving such questions :-)

  • harrymc

    A Windows installation contains almost all possible software. To the point that even some "uninstalled" software packages are present on disk, so that "installing" these packages doesn't even require the installation CD.

    On the other hand, a standard Linux installation is much more "lean and mean", where packages can easily be added via web depositories. Windows doesn't for the moment have this flexibility (although it's starting), and requires the installation CD.

    So, while the Windows installation does include superfluous components, the difference isn't as large as you might think. If you went ahead and installed almost every possible Linux package, this will also require a lot of disk space. If you then went ahead and down-cut Windows to the bone by uninstalling every unneeded option or executable, you would end up with a much smaller foot-print.

    Conclusion: Linux starts small and builds up. Windows starts large and shrinks down (however, with modern disk-space, nobody bothers).

  • andynormancx

    Windows has a lot more graphical config and admin tools than the average Linux distribution, which adds to the bulk. Linux is catching up in that respect, but still has a lot less in this area than Windows.

    Another area of extra bulk comes from the fact that Window keeps a second copy of many key files, so that it can restore them if they get corrupted.

  • matt wilkie

    In two words: package management.

    I think a stock Windows install is larger than a stock Linux distribution install because Linux can store most everything not needed immediately "somewhere out there" on a package repository mirror, in the cloud, or wherever. Need a new driver or app not installed yet? Simply apt-get install foo and a few minutes later you have it and are ready to go (substitute apt-get with the package manager of choice for your distribution).

    Windows on the other hand needs to have a lot more 3rd party compatibility stuff at hand, right now, because there is no coherent and capable package management system. Windows Update is okay for some drivers and security patches, but that's about it. There's very limited user side control for picking and choosing and even less ability for installing applications to request dependency X. So Windows needs to have as much as it possibly can ready to go out of the box.

    Yes there a host of useful things like flash and popular media codecs which aren't so easy to install on Linux. That doesn't detract from the central point though: Linux is smaller on the local machine because it can more easily pull what it needs from elsewhere and Windows can't.

    I surmise if you took a stock linux distribution and added all the backwards compatibility stuff in the standard Windows install there wouldn't be as big a gap, in terms of occupied storage space.