mysql - How can I get a mysqldump file encoded in utf-8 for psql?

25
2014-03
  • dsclementsen

    I am migrating some data from a MySql database (v5.1.44/MyISAM/collation=latin1_swedish_ci) to a PostgreSQL (v9.0.4/the one included in OSX Lion).

    I'm using

    $ mysqldump --compatible=postgresql > tmp.sql # output create/insert statements
    $ psql --command='\i tmp.sql' # import to postgresql
    

    However the import fails with the error ERROR: invalid byte sequence for encoding "UTF8": 0xe97261 (This is in reference to accented letters).

    The issue, I think, being that the exported file is not using utf-8.

    The file that is exported shows the following file information

    $ file tmp.sql
    tmp.sql: Non-ISO extended-ASCII text, with very long lines
    

    What's the easiest, scriptable way to get this file prepared in utf-8 for psql?

    This does not work:

    $ iconv -f ASCII -t UTF-8 tmp.sql > out.sql
    iconv: tmp.sql:18:59270: cannot convert
    

    I've found that opening the file in vim and issuing :set fenc=utf-8 does make the import run smoothly, but this has to be automated so I need to cut out this manual step.

  • Answers
  • slhck

    Try the following:

    mysqldump --default-character-set=charset_name
    

  • Related Question

    linux - How can I convert multiple files to UTF-8 encoding using *nix command line tools?
  • jason

    Possible Duplicate:
    Batch-convert files for encoding or line ending

    I have a bunch of text files that I'd like to convert from any given charset to UTF-8 encoding.

    Are there any command line tools or Perl (or language of your choice) one liners I can use to do this en masse?


  • Related Answers
  • grawity

    iconv does convert between many character encodings. So adding a little bash magic and we can write

    for file in *.txt; do
        iconv -f ascii -t utf-8 "$file" -o "${file%.txt}.utf8.txt"
    done
    

    This will run iconv -f ascii -t utf-8 to every file ending in .txt, sending the recoded file to a file with the same name but ending in .utf8.txt instead of .txt.

    It's not as if this would actually do anything to your files (because ASCII is a subset of UTF-8), but to answer your question about how to convert between encodings.