editing - Mess of right-to-left and left-to-right in Unicode

08
2014-07
  • porton

    How good Kate editor supports right-to-left text? I see a mess of the order. Not sure whether it is a "feature" of Unicode standard or a Kate bug.

    http://imagebin.ca/v/1EUMnLMK27aX

    I don't see where these FUTURE/PAST comments start and where they end.

    More generally (not only about Kate): Is there a way to edit Unicode files without a mess of directions?

    ... for example I want XML comments to look <!-- ... --> not <-- ... --!>. Is it possible at all, when editing mixed direction files?

  • Answers
    Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

    Related Question

    How can I edit Unicode text in Notepad++?
  • Jarvis

    Sometimes I edit English text that includes Unicode characters. For some reason, on my PC, Notepad++ converts Unicode characters to ???'s thereby corrupting the text and losing all that data. I'm looking for a way to edit such text, while preserving Unicode characters. I'm using Consolas as my Font. If the font doesn't have all those characters, why should I lose the data when I copy the text out of Notepad++ (via Windows' clipboard)?


  • Related Answers
  • Excellll

    If the file is actually encoded in Unicode, Notepad++ should detect it automatically. The Consolas font works well for me. You can try one of these two menu options:

    • Encoding -> Encode in UTF-8
    • Encoding -> Convert to UTF-8

    I'm pretty sure the first one will do what you want.

  • 8088

    There are good news and bad news.

    Good news: Notepad++ supports Unicode (at least from what I can gather).

    Bad news: Apparently Unicode support is only on Windows XP.

    I actually do not have a Windows machine in front of me. From what I remember, there is an Encoding menu under the Format menu somewhere. The encoding for Unicode is actually most commonly UTF-8.

    Here is a 'pretty' picture of Unicode support in Notepad++,

    enter image description here

  • Peter Mortensen

    The problem described in the question happens when an empty/new document is set to "ANSI" and Unicode characters are pasted into it.

    There is no auto-detection when used with an empty/new document, at least not in the version of Notepad++ I tested it on. "ANSI" is the default in Notepad++ for a new document unless set in menu Settings/Preferences/tab "New Document/Open Save Directory".

    Solution

    The solution is to set the encoding to UTF-8 before pasting, menu Format/Encode in UTF-8:

    Menu command "menu Format/Encode in UTF-8" about to be executed

    Example

    I copied some text to a new Notepad++ document, Russian (русский язык, russkiy yazyk,, from Firefox showing the Wikipedia page Russian language.

    If the encoding is not changed from "ANSI" this is the result:

    Result of pasting the Unicode string "Russian (русский язык, russkiy yazyk" into a new Notepad++ document without changing the encoding from the default "ANSI"

    If the encoding is changed this is the result:

    Result of pasting the Unicode string "Russian (русский язык, russkiy yazyk" into a new Notepad++ document after changing the encoding from the default "ANSI" to "UTF-8"

    As can be seen in the figure below (the Cyrillic part is highlighted), Notepad++ actually converts the Unicode characters into ASCII 63 (hex 3F), question marks. That is why the Unicode characters are lost (in "ANSI" mode) when copying the text out through the clipboard (it is not a font issue - information is lost).

    enter image description here

    Tested on: Notepad++ v5.4.5 (UNICODE).

  • Peter Mortensen

    Unicode works perfectly on Windows 7. The only issue that comes up is that you have to retype the characters that have been changed. It's happened to me. I'm writing with Scandinavian letters so ä -> E4, ö -> F6. It's a pain in the butt to replace them all, but it's worth it.

    If you encode a page from ANSI -> UTF-8 then there will be some character problems.

    I would suggest that you first create a new page in UTF-8 and then copy/paste your information over. There won't/shouldn't be any trouble then.