Unable to directly input unicode characters with diacritical marks into PuTTY on Windows 8
2014-07
After moving to Windows 8, I can no longer directly input unicode characters into PuTTY session window. Like ą
, ę
, ć
, ń
using Alt+<letter>
with Polish (programmers)
keyboard layout.
I have
Window -> Translation -> Remote character set
set toUTF-8
.Typing directly using the physical keyboard connected to the server works.
And, what is strange, pasting a text with these letters into PuTTY works, too.
The server is using UTF-8. Here,
ąęółśćżźń
is being pasted:m@debian:~$ echo ąęółśćżźń > x ; file x x: UTF-8 Unicode text m@debian:~$
Pressing e.g.
Alt+x
, that normally rendersź
, in PuTTY window results in a normal latinz
. Here,żźżźżź
is being pasted:m@debian:~$ echo żźżźżź | md5sum 1ff31403a1089c590ed55d42cdcd0f3e - m@debian:~$
Here,
żźżźżź
is being typed:m@debian:~$ echo zzzzzz | md5sum cd519e63e450d863e5ee02814bae016d - m@debian:~$
And here, a plain
zzzzzz
is being typed:m@debian:~$ echo zzzzzz | md5sum cd519e63e450d863e5ee02814bae016d - m@debian:~$
Same sum.
The only letter with a diacritic that is typable is
ó
(which is also present in latin1 charset).This same exact executable does work on Windows 7.
My guess is that Windows 8 is somehow deciding that PuTTY is unable to process typed (?) non-latin1 characters and it changes them on-the-fly to their latin1 counterparts.
What can be done?
Setting "Language for non-Unicode programs" as suggested in http://superuser.com/a/497880/214569 helped.
Our users are experiencing a very discouraging issue in regards to how MS Word (in Windows) handles non-unicode characters. This issue is confirmed in both Word 2007 and the Word 2010 Beta using Windows XP SP3; I suspect it works the same way in 2003.
Issue:
- A user creates a document using a non-unicode font, entering characters to represent scientific notations. For example, he enters a Mu (µ). Note: I pasted in a unicode-compliant Mu for reference.
- The user opens his document and attempts to copy / paste this non-unicode character representing a Mu into a web browser for entry into our system. It pastes as an unrecognized character. This is expected.
- The user opens his document, selects the non-unicode character and adjusts its font to "Arial Unicode MS," saving the document. He closes / re-opens the document for good measure. Once re-opened, he copies what should be a unicode Mu and pastes it into the web browser. It is still represented as an unrecognized character.
- The user creates a new document, sets the font to "Arial Unciode MS" and creates a Mu. He copies this Mu into the web browser and it pastes over in Unicode, as expected.
Conclusion:
Word is not actually converting non-unicode characters into unicode characters when it should, when a unicode font is selected. Instead, it is taking a best-guess for display reasons but doing no actual conversion.
How do I overcome this problem?
- Can I change some setting in Word to force a conversion? Preferable.
- Is there a "cleaner" app or Word macro that will do this?
- Other solutions?
Additional Notes:
- Re-typing the affected documents using unicode is not an option
- This is not an issue in Mac OS X using the most recent version of Word. A sample case such as in (3) results in a unicode Mu being pasted into the browser.
Please help!
Try using Paste Special
; there should be an option for Unicode text.
Note that if the source document was created with a Symbol font, this won't help. Windows doesn't really know that the character is related to a specific Unicode character, the symbol fonts were created before Unicode as a way of meeting a need and the two aren't interchangeable.
A lengthy process but I normally convert such files into images and then process those images through any OCR software. That helps. But, I was myself searching for an even better option.
Thanks Mark. You're right that Paste Special won't work for non-unicode text. Do you know of a utility that will "clean" non-unicode documents. For example, load document that contains non-unicode Mu symbol and convert said Mu symbols into unicode-compliant Mus? Is there a macro or plugin for Word that will do this?
Best regards, Kirby