0

in eclipse, I have a file where some place this is written:

onclick='obj1.help_open_new_window(fn1(), "/redir/url_name")'

and in eclipse Edit menu->set encoding, I see this:

enter image description here

Now I change the encoding to UTF-8 using the same dialog box and the text changes to:

onclick='obj1.help_open_new_window(fn1(),�"/redir/url_name")'

All I know is if this was not happening, then my website would be working fine. Why is this happening and what do I do to prevent this?

I do have some knowledge about encodings: Â and nbsp mystery explained The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) but still I do not understand why this is happening. Feel free to go to byte level(how file is stored) just to explain it.

UPDATE: Here's what I understand: if the file is encoded in latin-1 then every character is a byte and so is the . it should be hex(32). now when I convert it to utf-8, it still remains hex(32) and that is definitely . this leads me to believe that in latin-1, is not hex(32) but a combination of two bytes. How is that possible?

prongs
  • 9,422
  • 21
  • 67
  • 105
  • 1
    If you can, open the file with a hex editor to find out what bytes actually are stored in the file at that point. – AKX Jun 13 '12 at 10:14

1 Answers1

0

The character you have between the comma and the quote appear sto not be a normal space but some other whitespace character, probably the famous U+00A0 NO-BREAK SPACE. Since the file is encoded in latin1, the character is stored on disk as the byte \xA0, which does not form a valid character in UTF-8. This means that if you reload the file in your editor using UTF-8 you will see the universal replacement character in its stead. (The proper UTF-8 encoding of no-break space would be \xC2\xA0.)

To get rid of the problem replace the no-break space with a normal space (U+0020). There is no reason why you should use a no-break space in this context, i.e. in program text.

Joni
  • 108,737
  • 14
  • 143
  • 193
  • Okay but why would eclipse insert a `U+00A0` instead of `U+0020` on pressing a spacebar? – prongs Jun 14 '12 at 03:57
  • Maybe someone copied and pasted this code from a web page that used a no-break space. Or maybe someone accidentally typed in a no-break space: some people have a configuration that inserts a no-break space when they type Shift+Space for example. This often leads to needless no-break spaces in files they edit. – Joni Jun 14 '12 at 06:15