Set filename character encoding in Putty's PSFTP

29
2014-01
  • lacton

    I am using PuTTY's command line utility psftp.exe to transfer files between a UTF8-configured linux server and a MS Windows PC.

    File names containing non ASCII characters (e.g., Japanese kana) are corrupted when using the 'ls' or 'get' commands of the psftp utility.

    I tried to create a saved session from putty.exe with the translation set to UTF8, and use that saved session from psftp.exe (i.e., open saved_session_with_UTF8_translation), but the filename characters were still corrupted.

    How can I configure psftp.exe so that it uses the right charset for the file names?

  • Answers
  • amphetamachine

    You could try using tar(1) to make an archive of the files before you send them. Most (Windows) programs that support GNU Tarballs support converting the filenames to the Windows character set.


  • Related Question

    email - Mail character encoding formatting
  • notnoop

    Every now and then, I get an email that is not formatted properly, as in it contains many '=92' and '=' characters:

    We are looking for candidates to join our team.    Great qualifications inc=
    lude:
    
    *     PhD or Masters specializing in Machine Learning, Statistics, or related fi=
    elds.
    
    =B7     Experience dealing with large, real-life data sets. (not just pre-c=
    anned problems).
    

    Why would this occur? A buggy sender email client? Wrong MIME encoding?


  • Related Answers
  • harrymc

    The problem is maybe split between the sending and receiving email programs.
    It's certain that the sender of the email didn't see such a mess when he sent his email. The problem relates to how the actual encoding used by the sender is declared in the headers part of the email.

    The basic problem is that there are too many characters out there for them all to be expressed using only the simple ascii latin character set. The final solution is supposed to be Unicode, whose declared purpose is to contain all the world's character sets (which is already impossible). There are also intermediate solutions, such as quoted-printable which is probably what we see in your question.

    Now for each character set (except possibly Unicode) there are several independent implementation by each email client, to which you also add the implementation of the email headers.

    The result is that to see the exactly same depiction of the email text is only guaranteed if the same email client software is used for both sender and receiver. Especially to blame is Outlook, which does not respect too much the international standards, and is therefore liable to generate emails that other clients might have difficulties in displaying in identical manner.

    To this mess you should add the fact that different operating systems may give different numerical values to the same characters. For example, between the Mac and the PC there isn't an agreement on the numerical value of even a single accented character.

    This article amy also be interesting for you : Character encoding in e-mail: having to deal with GroupWise crap in 2004, and may show you similar problems that other people are experiencing.