windows - Convert between UTF-8 to 1255 online and locally?

07
2014-07
  • barlop

    I have this HTML file

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <HTML DIR="RTL" LANG="HE">
    <HEAD>
    <META http-equiv="Content-Type" content="text/html; charset=Windows-1255">
    </HEAD>
    <BODY>
      <H1>úåøä</H1>
    <H1>úåøä ðáéàéí åëúåáéí</H1>
    </BODY>
    </HTML>
    

    It is saved as ANSI in notepad. And when opened in a browser, it displays some hebrew characters fine.

    Note, Chrome can handle UTF-8 just as well and it can be copied into Ms Word 2010 just as well.

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <HTML DIR="RTL" LANG="HE">
    <HEAD>
    <meta http-equiv="content-type" content="text/html;charset=UTF-8">
    </HEAD>
    <BODY>
    <H1>תורה נביאים וכתובים</H1>
    </BODY>
    </HTML>
    

    But the following discussion involves copying/pasting from the Windows-1255 one.

    Copy to Clipboard in Chrome
    enter image description here

    Pasting into ms word 2010
    keep source formatting pic

    I can use my web browser, to convert that Windows-1255 into unicode(eg as UTF-8).

    For example, with that HTML. Chrome converts

    úåøä ---> תורה
    úåøä ðáéàéí åëúåáéí --> תורה נביאים וכתובים

    But how can I convert the other way?

    For example, suppose I have a file I write in notepad

    It has

    תורה  
    תורה נביאים וכתובים
    

    I might save it as UTF-8 or not at all. I could leave it in an untitled file.

    How do I convert it into

    úåøä   
    úåøä ðáéàéí åëúåáéí  
    

    If I find a webpage with the hebrew written on it, and I view source in chrome, I see it in hebrew, and I save it and it comes up in Windows-1255. As happens with http://www.mechon-mamre.org/i/t/t0.htm That is because the file itself is stored in Windows-1255 And if one saves it and opens it in notepad, one sees that.

    If I copied the hebrew characters into a file and saved it as utf-8 it'd display in Chrome but I can't see how to convert it to windows-1255.

    I can't see how to even get notepad to save hebrew characters as windows-1255

  • Answers
  • barlop

    For online I don't know. I only know latin->hebrew, by making an HTML page!

    Locally one can go both ways in a good text editor quite easily.

    For locally, dump notepad for this task, as while it supports UTF-8 and unicode characters including hebrew ones, it doesn't encode in Windows-1255(Hebrew) so when you try to save unicode as "ascii" it's not using windows 1255(hebrew). just 1252 or iso 8859-1 and it would not do it properly as the hebrew characters don't exist in 1252.

    The funny characters-latin ones, you see is 1255(Hebrew) misread as 1252 - you can do that, but you can't save hebrew as 1252 'cos notepad wouldn't know or doesn't calculate which latin characters to use. It just says you'll lose some characters and if you try to save them it won't and when you read them after it hasn't saved correctly, you'll get question marks or squares. So forget notepad for saving hebrew characters

    Use a text editor that supports Windows-1255(hebrew ascii). It works in editpad pro(not free). But notepad++ or babelpad probably do it just as easily too (though none of those programs are so good at the moment for pasting from them into ms word. epp you copy hebrew it pastes latin characters, notepad++ and babelpad you paste into word and don't get the "keep source formatting" option) but you can put the hebrew into an html page(like the utf-8 one in the question), and then copy it from chrome into ms word.

    open editpad pro, click convert..text encoding.. windows 1255

    copy/paste the hebrew characters from notepad into editpad pro.

    save the file.

    And you can go both ways.

    úåøä ----convert to windows 1255 (should also open as 1255) get the hebrew. And convert back to 1252 (should also open as 1252) (western european)


  • Related Question

    windows - Unable to convert file to UTF-8
  • AntonioCS

    I am on Windows XP sp3 and I am trying to convert a file from ASCII to UTF-8.

    I use notepad++ to do this. I go to Encoding > Convert to UTF-8 without BOM. I save the file, reopen and it is still on ASCII.

    I am using this file in a webpage and I need the file to be UTF-8, because I have strings in utf-8 and they am seeing little squares with ? on them.


  • Related Answers
  • Bakhtiyor

    Do the following:

    1. File->New
    2. Encoding->Encode in UTF-8 without BOM
    3. Copy and Paste your original text into this new file
    4. Save.

    It works on my computer with Windows XP SP3 and Notepad++v5.6.8. Hope it works on your computer also.