osx - Fix corrupted Unicode file names in a zip archive

08
2014-07
  • Nathaniel

    A colleague gave me a zip archive of some data I need to analyse, but unfortunately the filenames have been corrupted somewhere along the way, either when creating the archive or when extracting them on my machine.

    The archive contains 3000 files whose filenames contain Japanese characters. He zipped it on a Windows machine, and I'm using a Mac. If I double-click on the archive then I get file names that look like this:

    0001_rt_ñºéå-ïÅí ñºéå-àÍî _ÉAÅ[ÉãÉeÉBÅ[.dat
    

    On the other hand, if I use 'unzip' at the command line the same file comes out as

    0001_rt_%FB+%C4%EE-%F2%FC%C6-%FB+%C4%EE-%EA%DB%F6-_%E2A%FC[%E2%EF%E2e%E2B%FC[.dat
    

    The content of the files is fine (they don't contain any Japanese characters, only numbers), but I need to get at the original file names.

    Is there some way I can restore the correct file names without having access to the original files, which are on another computer in another city? I'm up for writing a quick Python script if that's a possibility, but I don't know much about character encodings, so I'm not sure how to go about it.

  • Answers
  • slhck

    The Unarchiver for OS X is a free and open-source app that can deal with this, and it will prompt you for the file name encoding when it cannot detect it properly.

    This will override the default program for unzipping files is OS X, but I find it much more powerful than the built-in one.


  • Related Question

    osx - Mac zip archives on windows
  • Questioner

    What is the best way to open zip archives which are made on a Mac? Frequently when I open a zip archive the date file are 0 bytes, while the mac_osx folder is filled with the correct data files.

    Its really irritating, so is there anybody who can help me with this problem?


  • Related Answers
  • Wuffers

    The problem here is that the files you are referring to have no equivalent in the Windows world.

    2009-11-24 14:38        Folder        Folder  AG Book Rounded map
    2009-11-24 14:38             0             0  AG Book Rounded map\AG Book Rounded
    2009-11-24 14:38        Folder        Folder  __MACOSX
    2009-11-24 14:38        Folder        Folder  __MACOSX\AG Book Rounded map
    2009-11-24 14:38        134598         69721  __MACOSX\AG Book Rounded map\._AG Book Rounded
    2009-11-24 14:38             0             0  AG Book Rounded map\AGBooRouBol
    2009-11-24 14:38         34659         33343  __MACOSX\AG Book Rounded map\._AGBooRouBol
    2009-11-24 14:38             0             0  AG Book Rounded map\AGBooRouReg
    2009-11-24 14:38         31172         29835  __MACOSX\AG Book Rounded map\._AGBooRouReg
    

    With the Macintosh OS, you actually have resource forks, and other hidden areas of the file... The user doesn't see these areas, but the applications can store data there. These fonts are storing their font data in a resource fork.... And windows doesn't understand what the resource fork is...

    So, if you want to use a cross-platform font, use a TrueType Font... The Mac will be able to read just about any particular type of font, so you don't have to limit yourself... If there is any doubt about a replacement font, just test it in Fontbook.

  • A. Dorton

    7-zip knows how to deal with the resource forks in a mac zip.