Character encoding at webserver on windows

07
2014-07
  • peterwilm

    I have set up a webserver on windows 8.1. Now i am having problems with the character encodings of links: There are directories and file names containing german umlauts (ä ö ü ß). The autoindex feature of the webserver generates index pages for the directories that display the umlauts correctly, but replaces the umlauts with surrogates within the links: ü -> %fc and ö -> %f6 . As a result, the linked file is not found. Nor is it found if i type ü manually instead of %fc

    Any Suggestion?

    webserver: nginx 1.5.10

    charset code in nginx.conf: charset iso-8859-1;

    os: windows 8.1 (same box for server and client)

    browser: chrome and ie

  • Answers
    Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

    Related Question

    Firefox character encoding problem
  • Mehper C. Palavuzlar

    I'm using Firefox 3.5.4 (EN) under Windows XP SP3 (TR). When I open the web reports page of my company, Turkish characters are not displayed properly, so I manually have to change the Character Encoding setting from Western (Windows-1252) to Turkish (Windows-1254). I don't have this problem with other Turkish sites as they automatically change the encoding to Turkish.

    How can I make Firefox automatically find the proper character encoding settings for problematic web sites?

    Edit: I've found the following code line in the source code of relevant page:

    <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="TEXT/HTML; CHARSET=WINDOWS-1254">
    

  • Related Answers
  • Arjan

    Generally the page's encoding is followed, unless the server specifies an encoding. As the <meta> tag seems to specify what you're expecting, and as manually switching to that value helps, it sounds like the server you're getting the page from is sending an incorrect encoding (Windows-1252) in the headers to the browser.

    The proper way to fix it is to configure the server properly. For a company webserver, this probably means bugging the server admin to do it.

    To see the (wrong) headers, if you're familiar with such tools, you can use things like Firebug's "Net" panel in Firefox, or Web Inspector's "Resources" panel in Chrome or Safari. Or, if you don't know these tools and the web site is publicly accessible, then you easily see the server's headers online using, for example, Web-Sniffer.

    Assuming the login page specifies the same as the actual pages, then this yields:

    Content-Type: text/html
    

    ...without any value for charset. Not sure if a browser should then still interpret that <meta> tag, but apparently Firefox is ignoring it, and making some best guess.

    Firefox ignoring it might be caused by the HTML source. The <meta> tag should always be specified within <head> before anything else, as it might also apply to the title, scripts, CSS and so on. On this site, it doesn't and, even worse, the HTML is a total mess:

    <SCRIPT LANGUAGE=JavaScript SRC="/dergi/_ScriptLibrary/pm.js"></SCRIPT>
    <SCRIPT LANGUAGE=JavaScript>
      thisPage._location = "/dergi/giris/login.asp";
    </SCRIPT>
    <FORM name=thisForm METHOD=post>
    <HTML>
    <style type="text/css">
    <!--
      [..]
    -->
    </style>
    <HEAD>
      [..]
      <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="TEXT/HTML; CHARSET=WINDOWS-1254">
      <META NAME="GENERATOR" CONTENT="Microsoft FrontPage 5.0">
      <META NAME="AUTHOR" CONTENT="[removed to protect the innocent...]">  
      <TITLE>YAYSAT DERGİ RAPORLARI</TITLE>
    </HEAD>
    <BODY>
    <center>
    [..]
    </center>
    </body>
    <INPUT type=hidden name="_method">
    <INPUT type=hidden name="_thisPage_state" value="">
    </FORM>
    </html>
    

    Huge developer fail.

    (Incidentally, Web-Sniffer shows <meta http-equiv=content-type content="text/html; charset=ISO-8859-1">, but that is due to its values for Accept-Charset. Firebug shows <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="TEXT/HTML; CHARSET=WINDOWS-1254"> just like in the question.)

  • harrymc

    The Firefox add-on Charset Switcher may help you if you don't control the contents of your website.

    If you're asking what html you should generate, then my first remark is that the text should not be encoded at all in Windows-1254. Html pages should more correctly be encoded in UTF-8, since this encoding is much surer to display correctly on all browsers and on all client operating systems.

    The tag should then look like:

    <meta http-equiv="Content-Type" content="text/html;charset=utf-8">

  • dikkertjedap

    This bug (Firefox 4.0.1) has been reported: https://bugzilla.mozilla.org/show_bug.cgi?id=651142