1

The question might be a bit basic – considering I'm not what the vast majority would consider a newcomer to front end web development.

I am teaching an 8 year old html, css and javascript. I'm taking the opportunity to also teach about utf-8 encoding, in particular the way HTML uses it to allow non-English characters to be encoded and displayed.

I want to show him how accented characters do not appear properly without including <meta charset="UTF-8"/>.

Surprisingly I was able to display "Á" in the test webpage when in theory this shouldn't have been possible as the utf-8 charset meta tag was missing.

After some research I came to the conclusion that in modern IDE's the encoding system comes "built in", hence there's no real need to write down <meta charset />. If this is wrong please correct me as I am currently confused as to what exactly happened and I don't want to teach wrong information to an 8 year old.

Andreas Bonini
  • 44,018
  • 30
  • 122
  • 156

2 Answers2

0

After some research I came to the conclusion that in modern IDE's the encoding system comes "built in", hence there's no real need to write down . If this is wrong please correct me

Yes, that is wrong!

Surprisingly I was able to display "Á" in the test webpage when in theory this shouldn't have been possible as the utf-8 charset meta tag was missing.

This is also wrong, let me explain!

UTF-8 is an encoding system. This means it describes how to map bytes into textual characters. It's certainly possible to display "Á" without using utf-8.

The letter A (normal, no accents) is encoded with the number 65 in both ASCII and UTF-8. In fact, all english characters and punctuation are encoded the same way across virtually all encodings, so encoding problems rarely become apparent in English-only text.

However, accented letters, non-english characters and emojis () are encoded differently in different encoding systems. What causes "corrupt" text to be displayed is an encoding mismatch: your web browser thinks the encoding used is X while the file was actually encoded with system Y, so byte values no longer map to correct characters. For example, system X uses number 250 to encode , while system Y uses number 190, and under system Y 250 is mapped to "Ë". So now my appear as "Ë".

<meta charset="utf-8"/> specifies the encoding used for the HTML file. It is absolutely needed. Your webpage worked without because browsers may use other ways to get it, including educated guesses, but it should always be explicitly written in the HTML to avoid problems down the line.

Andreas Bonini
  • 44,018
  • 30
  • 122
  • 156
0

You should specify the encoding for several reasons:

  • Even if the encoding system would come buit-in, you cannot know which is the default encoding chosen for the IDE.
  • HTML5 specification says that the default encoding should be taken from the transport layer when not specified which will be the default encoding charset for HTTP1.1: ISO-8859-1.

See the full explaination here: Why it's necessary to specify the character encoding in an HTML5 document if the default character encoding for HTML5 is UTF-8?

T.Trassoudaine
  • 1,242
  • 7
  • 13