Why do Chrome/Firefox fail choosing the right character encoding?

07
2014-07
  • OverTheRainbow

    In those two browsers, this web page has all the accented characters displayed as questions marks.

    Since the header apparently includes the right infos...

    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
    

    ... why aren't characters correctly displayed?

    I have to manually tell the browsers to use the Windows-1252 text encoding for the characters to be displayed as expected.

    Thank you.

  • Answers
  • BillThor

    The characters are displayed correctly according to your headers. You will need to change the character-set in the response header, or encode your data in utf-8. These days, I believe the second option is preferred.

    As you appear to be using Apache as your web server you can either output a line like Content-Type: text/html; charset=utf-8\n\n before any page content, or use mod-mime to change the character set using the AddCharset directive.

    These are your headers as I retrieved them:

    HTTP request sent, awaiting response...
    HTTP/1.1 200 OK
    Date: Mon, 14 Oct 2013 21:29:36 GMT
    Server: Apache
    Last-Modified: Sat, 31 Mar 2001 23:36:28 GMT
    ETag: "1474dab-a06b-380d60eb17700"
    Accept-Ranges: bytes
    Content-Length: 41067
    Vary: Accept-Encoding
    Keep-Alive: timeout=3, max=100
    Connection: Keep-Alive
    Content-Type: text/html; charset=utf-8
    

  • Related Question

    Firefox character encoding problem
  • Mehper C. Palavuzlar

    I'm using Firefox 3.5.4 (EN) under Windows XP SP3 (TR). When I open the web reports page of my company, Turkish characters are not displayed properly, so I manually have to change the Character Encoding setting from Western (Windows-1252) to Turkish (Windows-1254). I don't have this problem with other Turkish sites as they automatically change the encoding to Turkish.

    How can I make Firefox automatically find the proper character encoding settings for problematic web sites?

    Edit: I've found the following code line in the source code of relevant page:

    <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="TEXT/HTML; CHARSET=WINDOWS-1254">
    

  • Related Answers
  • Arjan

    Generally the page's encoding is followed, unless the server specifies an encoding. As the <meta> tag seems to specify what you're expecting, and as manually switching to that value helps, it sounds like the server you're getting the page from is sending an incorrect encoding (Windows-1252) in the headers to the browser.

    The proper way to fix it is to configure the server properly. For a company webserver, this probably means bugging the server admin to do it.

    To see the (wrong) headers, if you're familiar with such tools, you can use things like Firebug's "Net" panel in Firefox, or Web Inspector's "Resources" panel in Chrome or Safari. Or, if you don't know these tools and the web site is publicly accessible, then you easily see the server's headers online using, for example, Web-Sniffer.

    Assuming the login page specifies the same as the actual pages, then this yields:

    Content-Type: text/html
    

    ...without any value for charset. Not sure if a browser should then still interpret that <meta> tag, but apparently Firefox is ignoring it, and making some best guess.

    Firefox ignoring it might be caused by the HTML source. The <meta> tag should always be specified within <head> before anything else, as it might also apply to the title, scripts, CSS and so on. On this site, it doesn't and, even worse, the HTML is a total mess:

    <SCRIPT LANGUAGE=JavaScript SRC="/dergi/_ScriptLibrary/pm.js"></SCRIPT>
    <SCRIPT LANGUAGE=JavaScript>
      thisPage._location = "/dergi/giris/login.asp";
    </SCRIPT>
    <FORM name=thisForm METHOD=post>
    <HTML>
    <style type="text/css">
    <!--
      [..]
    -->
    </style>
    <HEAD>
      [..]
      <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="TEXT/HTML; CHARSET=WINDOWS-1254">
      <META NAME="GENERATOR" CONTENT="Microsoft FrontPage 5.0">
      <META NAME="AUTHOR" CONTENT="[removed to protect the innocent...]">  
      <TITLE>YAYSAT DERGİ RAPORLARI</TITLE>
    </HEAD>
    <BODY>
    <center>
    [..]
    </center>
    </body>
    <INPUT type=hidden name="_method">
    <INPUT type=hidden name="_thisPage_state" value="">
    </FORM>
    </html>
    

    Huge developer fail.

    (Incidentally, Web-Sniffer shows <meta http-equiv=content-type content="text/html; charset=ISO-8859-1">, but that is due to its values for Accept-Charset. Firebug shows <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="TEXT/HTML; CHARSET=WINDOWS-1254"> just like in the question.)

  • harrymc

    The Firefox add-on Charset Switcher may help you if you don't control the contents of your website.

    If you're asking what html you should generate, then my first remark is that the text should not be encoded at all in Windows-1254. Html pages should more correctly be encoded in UTF-8, since this encoding is much surer to display correctly on all browsers and on all client operating systems.

    The tag should then look like:

    <meta http-equiv="Content-Type" content="text/html;charset=utf-8">

  • dikkertjedap

    This bug (Firefox 4.0.1) has been reported: https://bugzilla.mozilla.org/show_bug.cgi?id=651142