2

I have been developing a parser that takes JavaScript as input and creates a compressed version of that JavaScript as output.

I found initially that the parser failed when attempting to read the input JavaScript. I believe this has something to do with the fact that Visual Studio 2008 saves its files by default as UTF-8. And when doing so, VS includes a couple of hidden characters at the start of the UTF-8 file.

As a workaround, I used Visual Studio to save the file as code page 1252. After doing so, my parser was able to read the input JavaScript.

Note that I need to use special European characters that include accents.

So, here are my questions:

  1. Should I use code page 1252 or UTF-8?
  2. Why does Visual Studio save files as UTF-8 by default?
  3. If I choose to save files as 1252 will that lead to problems?
  4. It appears to me that Eclipse saves files as code page 1252 by default. Does that sound right?
DavidRR
  • 18,291
  • 25
  • 109
  • 191
mark smith
  • 20,637
  • 47
  • 135
  • 187

5 Answers5

9

UTF-8 is a better option as it really support all known characters, while with 1252 you might end up with characters that you need missing from it (even in European languages).

Apparently, VS2008 saves UTF-8 with a byte order mark - it should be possible to either switch that off, or have the parser recognize it, or strip the BOM somewhere in between.

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
  • 3
    You can use the "Save with encoding" option in the save dialog and then explicitly select "UTF-8 without signature". – Joey Jun 14 '09 at 09:47
  • 1
    Yep thanks guys I saved it with No signature... and it appears to have worked... Is there anyone to say save / Create by default UTF-8 without signature in vs 2008? – mark smith Jun 14 '09 at 10:10
3

utf-8 has byte order mark (BOM) signature at the beginning of a file which some editors, and obviously libraries don't understand... http://en.wikipedia.org/wiki/Byte-order_mark

if you can get around it, UTF-8 is preferred today by all means. try stripping that first bytes of BOM before giving the JS code to that parser, or find an option in your IDE if it can not write that

1252 doesn't cause this issue and you won't have problems with it, but you'll output your web in an outdated format, i wouldn't do it today, there was a lot of encoding mess on the web in the past with iso vs. win codepages for different languages...

zappan
  • 3,668
  • 4
  • 29
  • 24
1

Use UTF-8. 1252 does not cover whole Europe, so in some countries (central Europe) you should use 1250, or more correctly - iso 8859-2. So the only real option is UTF-8.

smok1
  • 2,940
  • 26
  • 35
1

Using 1252 will cause issues?

Depends on the countries you app needs to work in

From the Top of my head, 1252 (or ISO 8859-1) will work in

  • UK
  • Germany
  • Switzerland
  • Austria
  • Italy
  • France
  • Netherlands
  • Iceland
  • Spain

Oh, Wikipedia has a more comprehensive List: http://en.wikipedia.org/wiki/ISO/IEC_8859-1

So you can use CP 1252 if your app is only used in the mentioned countries/languages.

jms
  • 785
  • 3
  • 8
  • 17
  • 1
    ISO 8859-1 has a couple of issues for rare French words, hence ISO 8859-15 was created. – Richard Jun 14 '09 at 17:21
  • **On Wikipedia:** [ISO/IEC 8859-1](http://en.wikipedia.org/wiki/ISO/IEC_8859-1), [ISO/IEC 8859-15](http://en.wikipedia.org/wiki/ISO/IEC_8859-15) and [Windows-1252](http://en.wikipedia.org/wiki/Windows-1252). – DavidRR Sep 06 '14 at 02:43
0

BOM was at the start of the file. IMHO you should use utf8, its very current nowadays.

erenon
  • 18,838
  • 2
  • 61
  • 93