1

I'm making a little website with german and french content. Some of the documents display text correctly, even though all umlauts are written as äöü and not with codes. Other docs need the codes but I can't find the difference between the documents.

When trying to google for an answer, I can only find tons of code references but no explanation why some docs don't need them.

Balz Guenat
  • 1,552
  • 2
  • 15
  • 35

3 Answers3

3

Any HTML document (or any text document for that matter) is encoded to a certain encoding - this is a mapping between the characters and the values representing them. Different encodings mean different characters.

Many pages use UTF-8 a Unicode encoding and they state so either in the HTTP header or in a Meta tag (Content-Type) on the page itself - such pages can use most characters directly.

You should read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).

Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • Here are two pages, [one](http://rebbergleist.ch/subpages/advent.html) being able to display the special e directly and [one](http://rebbergleist.ch/subpages/nordicwalking.html) that is not. There are no differences in the settings as to which charset they use. – Balz Guenat May 10 '12 at 08:59
  • @Coloneljesus - Neither of those are _pages_. They are HTML fragments. And from looking at the page _source_, they are displaying exactly right. – Oded May 10 '12 at 09:01
  • Looking at the source gave me the idea that the problem isn't in my code but in the encoding of the file. Turns out that the encoding setting in Notepad++ was different. Converted to UTF-8 and it works now. Thanks! – Balz Guenat May 10 '12 at 09:05
0

1) charset-declaration in the html-code (meta) 2) the encoding of your documents. For example... if you're working with UTF-8 and there is ONE document (for example a js-file) in ISO 8859-1 then some browsers will show you the site in ISO 8859-1 wich destroys your äöüß, ...

Sindhara
  • 1,423
  • 13
  • 20
0

Because, per the HTML specification:

Authoring tools (e.g., text editors) may encode HTML documents in the character encoding of their choice

Some documents use an encoding (such as iso‑8859‑1, or Windows‑1252, or utf‑8) that can represent the character ä directly; others use an encoding (such as us‑ascii) that cannot, and therefore need to use the character entity reference ä.

Brian Nixon
  • 9,398
  • 1
  • 18
  • 24