Converting iTunes m3u file to be readable by mpd (special characters and encoding issue)

07
2014-07
  • Theo13

    I have generated a iTunes m3u file because it is easier to create specific playlist (with intelligent playlists). It needed a little conversion to replace ^M characters to newline, but now I have another problem. In fact, mpd doesn't recognize the encoding of the file when there are special characters.

    When I copy/paste what I have in my file I have:

    cat -e test/1
    extern/chanson francM-LM-'aise/Mickey 3D/La TreM-LM-^Bve/10 L' Homme Qui Suivait Les Nuages.mp3$
    

    And the encoding that mpd can read for the same mp3 file:

    cat -e test/2
    extern/chanson franM-CM-'aise/Mickey 3D/La TrM-CM-*ve/10 L' Homme Qui Suivait Les Nuages.mp3$
    

    I've tested various iconv encoding but I can find the correct one to create a readable file for mpd. Is there somebody who knows how to do it? Thanks!

  • Answers
  • Theo13

    The solution was to use iconv on a Mac.

    Like this:

    iconv -f utf8-mac -t utf-8 file > file2
    

  • Related Question

    windows - Batch-convert files for encoding
  • desolat

    How can I batch-convert files in a directory for their encoding (e.g. ANSI->UTF-8) with a command or tool?

    For single files an editor helps, but how to do the mass files job?


  • Related Answers
  • elbekko

    Cygwin or GnuWin32 provide Unix tools like iconv and dos2unix (and unix2dos). Under Unix/Linux/Cygwin, you'll want to use "windows-1252" as the encoding instead of ANSI (see below). (Unless you know your system is using a codepage other than 1252 as its default codepage, in which case you'll need to tell iconv the right codepage to translate from.)

    Convert from one (-f) to the other (-t) with:

    $ iconv -f windows-1252 -t utf-8 infile > outfile
    

    Or in a find-all-and-conquer form:

    ## this will clobber the original files!
    $ find . -name '*.txt' -exec iconv --verbose -f windows-1252 -t utf-8 {} \> {} \;
    

    Alternatively:

    ## this will clobber the original files!
    $ find . -name '*.txt' -exec iconv --verbose -f windows-1252 -t utf-8 -o {} {} \;
    

    This question has been asked many times on this site, so here's some additional information about "ANSI". In an answer to a related question, CesarB mentions:

    There are several encodings which are called "ANSI" in Windows. In fact, ANSI is a misnomer. iconv has no way of guessing which you want.

    The ANSI encoding is the encoding used by the "A" functions in the Windows API (the "W" functions use UTF-16). Which encoding it corresponds to usually depends on your Windows system language. The most common is CP 1252 (also known as Windows-1252). So, when your editor says ANSI, it is meaning "whatever the API functions use as the default ANSI encoding", which is the default non-Unicode encoding used in your system (and thus usually the one which is used for text files).

    The page he links to gives this historical tidbit (quoted from a Microsoft PDF) on the origins of CP 1252 and ISO-8859-1, another oft-used encoding:

    [...] this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft, which became ISO Standard 8859-1. However, in adding code points to the range reserved for control codes in the ISO standard, the Windows code page 1252 and subsequent Windows code pages originally based on the ISO 8859-x series deviated from ISO. To this day, it is not uncommon to have the development community, both within and outside of Microsoft, confuse the 8859-1 code page with Windows 1252, as well as see "ANSI" or "A" used to signify Windows code page support.

  • Community

    with powershell you can do something like this:

    %  get-content IN.txt | out-file -encoding ENC -filepath OUT.txt
    

    while ENC is something like unicode, ascii, utf8, utf32. checkout 'help out-file'.

    to convert all the *.txt files in a directory to utf8 do something like this:

    % foreach($i in ls -name DIR/*.txt) { \
           get-content DIR/$i | \
           out-file -encoding utf8 -filepath DIR2/$i \
      }
    

    which creates a converted version of each .txt file in DIR2.

    EDIT: To replace the files in all subdirectories use:

    % foreach($i in ls -recurse -filter "*.java") {
        $temp = get-content $i.fullname
        out-file -filepath $i.fullname -inputobject $temp -encoding utf8 -force
    }
    
  • nagul

    The Wikipedia page on newlines has a section on conversion utilities.

    This seems your best bet for a conversion using only tools Windows ships with:

    TYPE unix_file | FIND "" /V > dos_file
    
  • 8088

    UTFCast is a Unicode converter for Windows which supports batch mode. I'm using the paid version and am quite comfortable with it.

    UTFCast is a Unicode converter that lets you batch convert all text files to UTF encodings with just a click of your mouse. You can use it to convert a directory full of text files to UTF encodings including UTF-8, UTF-16 and UTF-32 to an output directory, while maintaining the directory structure of the original files. It doesn't even matter if your text file has a different extension, UTFCast can automatically detect text files and convert them.

  • nik

    There is dos2unix on unix.
    There was another similar tool for Windows (another ref here).

    How do I convert between Unix and Windows text files? has some more tricks

  • user1055927

    You can use EncodingMaster. It's free, it has a Windows, Linux and Mac OS X version and works really good.

  • Matthew Williams

    iconv -f original_charset -t utf-8 originalfile > newfile

    run the above command in for loop.