linux - How can I convert multiple files to UTF-8 encoding using *nix command line tools?

25
2014-03
  • jason

    Possible Duplicate:
    Batch-convert files for encoding or line ending

    I have a bunch of text files that I'd like to convert from any given charset to UTF-8 encoding.

    Are there any command line tools or Perl (or language of your choice) one liners I can use to do this en masse?

  • Answers
  • grawity

    iconv does convert between many character encodings. So adding a little bash magic and we can write

    for file in *.txt; do
        iconv -f ascii -t utf-8 "$file" -o "${file%.txt}.utf8.txt"
    done
    

    This will run iconv -f ascii -t utf-8 to every file ending in .txt, sending the recoded file to a file with the same name but ending in .utf8.txt instead of .txt.

    It's not as if this would actually do anything to your files (because ASCII is a subset of UTF-8), but to answer your question about how to convert between encodings.


  • Related Question

    conversion - App to convert from ANSI to UTF8 on windows
  • AntonioCS

    Possible Duplicate:
    Batch-convert files for encoding or line ending under Windows

    Hey!

    I have many files that are encoded in the ANSI (iso-8859-1) format and I want to change it to utf8.

    I am converting one by one using notepad++ but I was wondering if there is any application that will convert them all (I have many files) in a quick and easy way.

    Anyone know of one app that will do this?? (free app would be great)

    Thanks


  • Related Answers
  • hanleyp

    This is a perfect fit for a scripting language to convert Windows-1252 to UTF-8.

  • alex

    You could try this SourceForge app. From the website:

    Codepage Converter - Convert HTML/Text files to different encoding formats e.g. ANSI to UTF-8 or Unicode. Convert multiple files with 1 click. Works with all encodings

  • Area 51

    A bit late, but: If you saved your scripts as 'UTF without BOM' and notepad++ is now opening them as ansi -> you can 'fix' this behaviour by including a string of multibyte characters somewhere in your comments to force notepad++ to recognise the UTF encoding of the file. It's a complete hackjob, but it works ;-)