7

My "em dash" character is shown differently on two servers.

When I visit Server 1:

When I visit Server 2: â€"Â

I'm not using any database connection, just pure HTML.

Following are the first 4 lines of my HTML file:

<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <meta charset="utf-8" />

Please help me here, I can't see what's wrong with it.

-solution-

Like suggested below I replaced my dash with

&#8211;

To make the server display my ►-character correctly I had to place a .htaccess in the folder with the following line of code:

AddDefaultCharset UTF-8

Thanks everyone!

Community
  • 1
  • 1
G McLuhan
  • 294
  • 1
  • 4
  • 14

2 Answers2

6

This may well happen, if the servers send different Content-Type headers. Exactly the same document may have different meanings when served with different encoding information.

It is also possible that something gets changed when uploading a file (incorrect conversions). But in this case, and usually, the header issue probably explains the difference.

If the document is UTF-8 encoded and contains “–” (which is EN DASH, U+2013, not EM DASH), then it gets displayed OK if the headers specify Content-Type: text/html;charset=utf-8. But if the header has e.g. windows-1252 instead of utf-8, then the three bytes that constitute the UTF-8 encoded representation of “–”, namely 0xE2 0x80 0x93, will be interpreted as per windows-1252 encoding, which means —. What happens then is somewhat obscure, if you really see â€"Â, but it’s more important to fix the encoding issue, which probably solves the problem.

Check out the W3C tutorial on encodings.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
  • 2
    Exactly. To clarify, if you have an HTTP header that specifies an encoding, AND a meta tag that specifies an encoding, the HTTP header will win! – Mr Lister Mar 19 '12 at 17:46
  • I now have learned that server 2 uses ISO-8859-1. I assume I'll have to switch to graphics, icon fonts or whatever to get my "►-character" in. – G McLuhan Mar 21 '12 at 20:34
2

It's possible they're being served with different encodings. In UTF-8, you can just include the m-dash directly (—), but if the page is being served as ASCII, it needs to be encoded as &mdash;. Take a look at the source and see which one it uses.

I think this is what's happening, because "—" is multiple bytes long, so it would be interpreted as multiple ASCII characters.

Brendan Long
  • 53,280
  • 21
  • 146
  • 188
  • This works – but only on my dash. But I also use the ►-character as a play symbol. I researched a bit but there seems to be nothing like `&play;`. Where can I look it up? – G McLuhan Mar 19 '12 at 17:00
  • 1
    @GMcLuhan - The easy/best solution is to serve content as UTF-8 (using [the content-type meta tag](http://www.htmldog.com/guides/htmlintermediate/metatags/) for example). Then, you can include any symbol you want without looking up the HTML encoding. If you can't do that for some reason, you can use [numeric character reference](http://en.wikipedia.org/wiki/Numeric_character_reference). – Brendan Long Mar 19 '12 at 17:15
  • I already inserted the meta tag and it does work on server 1. But server 2 still won't display it correctly. I'll try the numeric character reference – G McLuhan Mar 19 '12 at 17:25
  • @GMcLuhan - Can you include the entire HTML file in your question? – Brendan Long Mar 19 '12 at 17:33
  • @Brendan Long, the HTML file apparently isn’t the problem; the HTTP headers are. The URLs would give access to them, but they don’t directly tell what needs to be done, or what can be done. It depends on the server software and its settings. – Jukka K. Korpela Mar 19 '12 at 19:19
  • @Brendan Long, no, they most definitely should not, by the specifications, and they do not, in implementations. – Jukka K. Korpela Mar 19 '12 at 21:02
  • I guess the priority is HTTP header, meta tag, charset attribute. It seems like that's the exact opposite order that any normal person would want, but [that's standards](http://www.w3.org/TR/html4/charset.html#spec-char-encoding). – Brendan Long Mar 19 '12 at 22:14
  • @BrendanLong I believe the original idea was that any fool could copy and paste HTML code from other pages (without knowing what they were copying), but that it took someone in the know to be able to change the HTTP headers on the server - hence, the HTTP header was more reliable. Anyway, most servers don't specify an encoding in the header, so you can usually specify one in the HTML files. Or, if possible, send a HTTP header using the server-side language. – Mr Lister Mar 20 '12 at 07:37