ubuntu - iconv supports too few encoding

07
2014-07
  • schemacs

    iconv -l outputs too few encodings on CentOS 6.5:

     $ iconv -l
      10646-1:1993, 10646-1:1993/UCS4, ANSI_X3.4-1968, ANSI_X3.4-1986, ANSI_X3.4,   ASCII, CP367, CSASCII, CSUCS4, IBM367,
      ISO-10646, ISO-10646/UCS2,   ISO-10646/UCS4, ISO-10646/UTF-8,
      ISO-10646/UTF8, ISO-IR-6, ISO-IR-193,   ISO646-US, ISO_646.IRV:1991,
      OSF00010020, OSF00010100, OSF00010101,   OSF00010102, OSF00010104,
      OSF00010105, OSF00010106, OSF05010001, UCS-2,   UCS-2BE, UCS-2LE,
      UCS-4, UCS-4BE, UCS-4LE, UCS2, UCS4, UNICODEBIG,   UNICODELITTLE,
      US-ASCII, US, UTF-8, UTF8, WCHAR_T
    

    But on my Ubuntu the list seems much longer, here is different:

    CentOS6.5:

    $ php -a 
    php > echo iconv('utf8', 'gbk', 'abc');
    PHP Notice:  iconv(): Wrong charset, conversion from `utf8' to `gbk' is not allowed in php shell code on line 1 
    php > quit 
    $ php -i|grep iconv
    
    iconv
    iconv support => enabled
    iconv implementation => glibc
    iconv library version => 2.12
    iconv.input_encoding => ISO-8859-1 => ISO-8859-1
    iconv.internal_encoding => ISO-8859-1 => ISO-8859-1
    iconv.output_encoding => ISO-8859-1 => ISO-8859-1
    

    Ubuntu 14.04:

    $ php -a Interactive mode enabled
    
    php > echo iconv('utf8', 'gbk', "abc\n");
    abc 
    php > quit 
    $ php -i|grep iconv
    
    iconv
    iconv support => enabled
    iconv implementation => glibc
    iconv library version => 2.19
    iconv.input_encoding => ISO-8859-1 => ISO-8859-1
    iconv.internal_encoding => ISO-8859-1 => ISO-8859-1
    iconv.output_encoding => ISO-8859-1 => ISO-8859-1
    

    But I don't want to recompile glibc(this will be huge mount of work), any idea on how to ad new encoding support?

  • Answers
  • schemacs

    After yum upgrade, iconv works well when glibc-2.12-1.132.el6_5.2.x86_64 is installed, while the old package is glibc-2.12-1.132.el6.x86_64.


  • Related Question

    encoding - convert file type to utf-8 on unix - iconv is failing
  • pedalpete

    Possible Duplicates:
    Batch-convert files for encoding or line ending under Windows
    How can I convert multiple files to UTF-8 encoding using *nix command line tools?

    I've got a php file on my windows machine that upon moving over to *nix with winSCP, is not showing the characters correctly.

    I've dragged the file back from the linux machine down to windows and checked the encoding with Notepad++, and it says it ANSI.

    So i tried iconv -f ANSI -t utf-8 filename.php>filename.php, but get an error that ANSI conversion is not supported. I've also tried MS_ANSI, and I get no error, but I also don't get the file showing the proper encoding.

    I open the file with winSCP to see how it looks, and many special characters show up as '?'. Seeing as the purpose of the script is to remove these special characters from my data, it is really causing a bit of an issue.

    Is there another tool for changing the encoding? I tried yum iconv, but get a no package available response.

    How would you convert this file to the proper encoding?


  • Related Answers
  • quack quixote

    I have similar troubles with MD5 hashes created on WindowsXP (under Cygwin), saved to a file, then copied to a Linux system where the hashes are computed for copy verification. If the name of a file being hashed contains non-ASCII characters, md5sum reports the file missing, because it's not decoding the filename correctly. However, if I open the textfile containing the hashes in Notepad and change the encoding from ANSI to UTF-8, the Linux md5sum will get the encoding correct.

    ANSI isn't really a proper encoding (to anyone but Microsoft), so that's why iconv isn't picking up on it. You might get away windows-1252 instead, but there's no guarantee it will always work:

    iconv -f windows-1252 -t utf-8 filename.from > filename.to
    

    For the record, file gives me this on one of those MD5 textfiles:

    $ file tequila.ansi.txt
    tequila.ansi.txt: ISO-8859 text
    
  • Matthew Talbert

    You could just convert it to UTF-8 with Notepad++.

  • CesarB

    There are several encodings which are called "ANSI" in Windows. In fact, ANSI is a misnomer. iconv has no way of guessing which you want.

    The ANSI encoding is the encoding used by the "A" functions in the Windows API (the "W" functions use UTF-16). Which encoding it corresponds to usually depends on your Windows system language. The most common is CP 1252 (also known as Windows-1252). So, when your editor says ANSI, it is meaning "whatever the API functions use as the default ANSI encoding", which is the default non-Unicode encoding used in your system (and thus usually the one which is used for text files).

    So, to convert the file correctly, you first should find out which is the "ANSI" encoding for your Windows system (or simply ask your text editor there to save using a specific encoding).

  • hlovdal

    Are you sure "ANSI" is the correct character encoding/input name for iconv? You could try to run "file filename.php", often file will tell (what it thinks) the encoding is. You could also try to not specify the from encoding when doing the conversion, or you could just try all of them:

    for i in `iconv -l`; do iconv -f $i -t utf-8 filename.php > filename.php.$i; done