Converting iTunes m3u file to be readable by mpd (special characters and encoding issue)

07
2014-07

Theo13

I have generated a iTunes m3u file because it is easier to create specific playlist (with intelligent playlists). It needed a little conversion to replace ^M characters to newline, but now I have another problem. In fact, mpd doesn't recognize the encoding of the file when there are special characters.

When I copy/paste what I have in my file I have:

cat -e test/1
extern/chanson francM-LM-'aise/Mickey 3D/La TreM-LM-^Bve/10 L' Homme Qui Suivait Les Nuages.mp3$

And the encoding that mpd can read for the same mp3 file:

cat -e test/2
extern/chanson franM-CM-'aise/Mickey 3D/La TrM-CM-*ve/10 L' Homme Qui Suivait Les Nuages.mp3$

I've tested various iconv encoding but I can find the correct one to create a readable file for mpd. Is there somebody who knows how to do it? Thanks!

Answers

Theo13

The solution was to use iconv on a Mac.

Like this:

iconv -f utf8-mac -t utf-8 file > file2

Related Answers

elbekko

Cygwin or GnuWin32 provide Unix tools like iconv and dos2unix (and unix2dos). Under Unix/Linux/Cygwin, you'll want to use "windows-1252" as the encoding instead of ANSI (see below). (Unless you know your system is using a codepage other than 1252 as its default codepage, in which case you'll need to tell iconv the right codepage to translate from.)

Convert from one (-f) to the other (-t) with:

$ iconv -f windows-1252 -t utf-8 infile > outfile

Or in a find-all-and-conquer form:

## this will clobber the original files!
$ find . -name '*.txt' -exec iconv --verbose -f windows-1252 -t utf-8 {} \> {} \;

Alternatively:

## this will clobber the original files!
$ find . -name '*.txt' -exec iconv --verbose -f windows-1252 -t utf-8 -o {} {} \;

This question has been asked many times on this site, so here's some additional information about "ANSI". In an answer to a related question, CesarB mentions:

There are several encodings which are called "ANSI" in Windows. In fact, ANSI is a misnomer. iconv has no way of guessing which you want.

The ANSI encoding is the encoding used by the "A" functions in the Windows API (the "W" functions use UTF-16). Which encoding it corresponds to usually depends on your Windows system language. The most common is CP 1252 (also known as Windows-1252). So, when your editor says ANSI, it is meaning "whatever the API functions use as the default ANSI encoding", which is the default non-Unicode encoding used in your system (and thus usually the one which is used for text files).

The page he links to gives this historical tidbit (quoted from a Microsoft PDF) on the origins of CP 1252 and ISO-8859-1, another oft-used encoding:

[...] this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft, which became ISO Standard 8859-1. However, in adding code points to the range reserved for control codes in the ISO standard, the Windows code page 1252 and subsequent Windows code pages originally based on the ISO 8859-x series deviated from ISO. To this day, it is not uncommon to have the development community, both within and outside of Microsoft, confuse the 8859-1 code page with Windows 1252, as well as see "ANSI" or "A" used to signify Windows code page support.

Community

with powershell you can do something like this:

%  get-content IN.txt | out-file -encoding ENC -filepath OUT.txt

while ENC is something like unicode, ascii, utf8, utf32. checkout 'help out-file'.

to convert all the *.txt files in a directory to utf8 do something like this:

% foreach($i in ls -name DIR/*.txt) { \
       get-content DIR/$i | \
       out-file -encoding utf8 -filepath DIR2/$i \
  }

which creates a converted version of each .txt file in DIR2.

EDIT: To replace the files in all subdirectories use:

% foreach($i in ls -recurse -filter "*.java") {
    $temp = get-content $i.fullname
    out-file -filepath $i.fullname -inputobject $temp -encoding utf8 -force
}

nagul

The Wikipedia page on newlines has a section on conversion utilities.

This seems your best bet for a conversion using only tools Windows ships with:

TYPE unix_file | FIND "" /V > dos_file

8088

UTFCast is a Unicode converter for Windows which supports batch mode. I'm using the paid version and am quite comfortable with it.

UTFCast is a Unicode converter that lets you batch convert all text files to UTF encodings with just a click of your mouse. You can use it to convert a directory full of text files to UTF encodings including UTF-8, UTF-16 and UTF-32 to an output directory, while maintaining the directory structure of the original files. It doesn't even matter if your text file has a different extension, UTFCast can automatically detect text files and convert them.

nik

There is dos2unix on unix.
There was another similar tool for Windows (another ref here).

How do I convert between Unix and Windows text files? has some more tricks

user1055927

You can use EncodingMaster. It's free, it has a Windows, Linux and Mac OS X version and works really good.

Matthew Williams

iconv -f original_charset -t utf-8 originalfile > newfile

run the above command in for loop.

Home

Converting iTunes m3u file to be readable by mpd (special characters and encoding issue)