conversion - Convert .txt. file from windows-1251 to utf-8

07
2014-07
  • Haradzieniec

    I'm trying to read the cyrillic text that is stored in .txt format. Once I open it with OpenOffice, it is OK. Once I open it with Notepad++, it shows unreadable symbols. Setting Windows-1251 prior opening the file doesn't help too much. The default value switches to Encode to UTF-8.

    Is there a way to convert my text into UTF-8?

  • Answers
  • Haradzieniec

    I've tried today on another computer (with Cyrillic support). Everything is OK with this way: OpenOffice -> Copy the text into buffer -> Paste it in Notepad++ (UTF-8 without BOM)-> Save. Saved in UTF-8.


  • Related Question

    encoding - convert file type to utf-8 on unix - iconv is failing
  • pedalpete

    Possible Duplicates:
    Batch-convert files for encoding or line ending under Windows
    How can I convert multiple files to UTF-8 encoding using *nix command line tools?

    I've got a php file on my windows machine that upon moving over to *nix with winSCP, is not showing the characters correctly.

    I've dragged the file back from the linux machine down to windows and checked the encoding with Notepad++, and it says it ANSI.

    So i tried iconv -f ANSI -t utf-8 filename.php>filename.php, but get an error that ANSI conversion is not supported. I've also tried MS_ANSI, and I get no error, but I also don't get the file showing the proper encoding.

    I open the file with winSCP to see how it looks, and many special characters show up as '?'. Seeing as the purpose of the script is to remove these special characters from my data, it is really causing a bit of an issue.

    Is there another tool for changing the encoding? I tried yum iconv, but get a no package available response.

    How would you convert this file to the proper encoding?


  • Related Answers
  • quack quixote

    I have similar troubles with MD5 hashes created on WindowsXP (under Cygwin), saved to a file, then copied to a Linux system where the hashes are computed for copy verification. If the name of a file being hashed contains non-ASCII characters, md5sum reports the file missing, because it's not decoding the filename correctly. However, if I open the textfile containing the hashes in Notepad and change the encoding from ANSI to UTF-8, the Linux md5sum will get the encoding correct.

    ANSI isn't really a proper encoding (to anyone but Microsoft), so that's why iconv isn't picking up on it. You might get away windows-1252 instead, but there's no guarantee it will always work:

    iconv -f windows-1252 -t utf-8 filename.from > filename.to
    

    For the record, file gives me this on one of those MD5 textfiles:

    $ file tequila.ansi.txt
    tequila.ansi.txt: ISO-8859 text
    
  • Matthew Talbert

    You could just convert it to UTF-8 with Notepad++.

  • CesarB

    There are several encodings which are called "ANSI" in Windows. In fact, ANSI is a misnomer. iconv has no way of guessing which you want.

    The ANSI encoding is the encoding used by the "A" functions in the Windows API (the "W" functions use UTF-16). Which encoding it corresponds to usually depends on your Windows system language. The most common is CP 1252 (also known as Windows-1252). So, when your editor says ANSI, it is meaning "whatever the API functions use as the default ANSI encoding", which is the default non-Unicode encoding used in your system (and thus usually the one which is used for text files).

    So, to convert the file correctly, you first should find out which is the "ANSI" encoding for your Windows system (or simply ask your text editor there to save using a specific encoding).

  • hlovdal

    Are you sure "ANSI" is the correct character encoding/input name for iconv? You could try to run "file filename.php", often file will tell (what it thinks) the encoding is. You could also try to not specify the from encoding when doing the conversion, or you could just try all of them:

    for i in `iconv -l`; do iconv -f $i -t utf-8 filename.php > filename.php.$i; done