filesystems - How do I convert a Linux disk image into a sparse file?

06
2013-08
  • endolith

    I have a bunch of disk images, made with ddrescue, on an EXT partition, and I want to reduce their size without losing data, while still being mountable.

    How can I fill the empty space in the image's filesystem with zeros, and then convert the file into a sparse file so this empty space is not actually stored on disk?

    For example:

    > du -s --si --apparent-size Jimage.image 
    120G Jimage.image
    > du -s --si Jimage.image 
    121G Jimage.image
    

    This actually only has 50G of real data on it, though, so the second measurement should be much smaller.

    This supposedly will fill empty space with zeros:

    cat /dev/zero > zero.file
    rm zero.file
    

    But if sparse files are handled transparently, it might actually create a sparse file without writing anything to the virtual disk, ironically preventing me from turning the virtual disk image into a sparse file itself. :) Does it?

    Note: For some reason, sudo dd if=/dev/zero of=./zero.file works when cat does not on a mounted disk image.

  • Answers
  • mihi

    First of all, sparse files are only handled transparently if you seek, not if you write zeroes.

    To make it more clear, the example from Wikipedia

    dd if=/dev/zero of=sparse-file bs=1k count=0 seek=5120
    

    does not write any zeroes, it will open the output file, seek (jump over) 5MB and then write zero zeroes (i. e. nothing at all). This command (not from Wikipedia)

    dd if=/dev/zero of=sparse-file bs=1k count=5120
    

    will write 5MB of zeroes and will not create a sparse file!

    As a consequence, a file that is already non-sparse will not magically become sparse later.

    Second, to make a file with lots of zeroes sparse, you have to cp it

    cp --sparse=always original sparsefile
    

    or you can use tar's or rsync's --sparse option as well.

  • Janne Pikkarainen

    Do you mean that your ddrescue created image is, say, 50 GB and in reality something much less would suffice?

    If that's the case, couldn't you just first create a new image with dd:

    dd if=/dev/zero of=some_image.img bs=1M count=20000
    

    and then create a filesystem in it:

    mkfsofyourchoice some_image.img
    

    then just mount the image, and copy everything from the old image to new one? Would that work for you?

  • Grumbel

    PartImage can create disk images that only store the used blocks of a filesystem, thus drastically reducing the required space by ignoring unused block. I don't think you can directly mount the resulting images, but going:

    image -> partimage -> image -> cp --sparse=alway
    

    Should produce what you want (might even be possible to stick the last step, haven't tried).

  • endolith

    There's now a tool called virt-sparsify which will do this. It fills up the empty space with zeros and then copies the image to a sparse file. It requires installing a lot of dependencies, though.

  • hotei

    I suspect you'll require a custom program written to that spec if that's REALLY what you want to do. But is it...?

    If you've actually got lots of all-zero areas then any good compression tool will get it down significantly. And trying to write sparse files won't work in all cases. If I recall correctly, even sparse files take up a minimum of 1 block of output storage where the input block contains ANY bits that are non-zero. For instance - say you had a file that had an average of even 1 non-zero bit per 512 byte block - it can't be written "sparsely". By the way, you're not going to lose data if you compress the file with zip, bzip, bzip2 or p7zip. They aren't like mpeg or jpeg compression that is lossy.

    On the other hand, if you need to do random seek reads into the file then compression might be more trouble than it's worth and you're back to the sparse write. A competent C or C++ programmer should be able to write something like that in an hour or less.


  • Related Question

    Linux File system for a 1 TB External Hard drive
  • letronje

    I have a 1TB external seagate hard drive that i use it with my dell inspiron 1525 laptop running UBuntu 9.04. I dont intend to share my hdd with anyone, so my laptop is the only comp i will plug it into. Its on NTFS right now, it auto mounts as soon as i plug it in and i can transfer data without worrying about permissions.

    But, ntfs-3g takes up a lot of cpu and resources, particularly for long running data transfers of over 20G. So i want to shift to a native linux file system for my hdd (probably ext4 or ext3 ?).

    1) Will there be a significant difference in data transfer rate( <= 26 MBps currently) , cpu usage and system responsiveness with a native file system as opposed to NTFS ?

    2) Will auto mounting work or do i have to manually mount it every time.

    3) What do i have to do to mask file system permissions or to make sure that the permissions don't get in the way like in NTFS ?

    EDIT:

    I don't intend to ever use the drive in windows.


  • Related Answers
  • user7963

    I run exactly this configuration with XFS file system, and I am happy with it. Numerous comparisons of different file systems for Linux can be found on the Net, but I personally like low CPU utilization during copying or deleting large number of files.

    Not sure if the transfer speed with XFS will be much better than with NTFS - 25 MB/s is a typical limit for some USB controllers

  • Peltier

    You will have problems with file permissions with ext4, except if your user has the same UID on all the machines you use, or if you manage to set all files to 777 somehow.

    I've had the same problem a while ago, and the solution was to use the UDF filesystem. It works fine, even though the filesystem management tools don't seem do be updated much (the last version is from 2004).

  • Johan

    You forget that ext3 and ext4 are journalling filesystems so the chance of not losing data if your computer crash or similar are better.

    And if the disk will never again touch a windows computer, then switch.