0

I have text string (Java String) which should contain 'gerald.o'leary'.

In realty it contains a control character in addition to the rest, 'c2'. I have included the hex dump, please see image1.png for a hex dump.

enter image description here

When I save it to the db and read it back by running SQL in a client and copy pasting into a hex editor I see c2 replaced by 3f, please see image2.png.

enter image description here

I could have lived with it but for the fact when these two strings are compared in Java using String.equals(), false is returned.

Can somebody please explain what is going on here?!

kmansoor
  • 4,265
  • 9
  • 52
  • 95
  • Actually, `c2 92` is being replaced by `3f`. The reduction of two bytes to one is pretty clear it's an encoding issue somewhere. – mellamokb Oct 02 '12 at 19:49
  • http://www.fileformat.info/info/unicode/char/92/index.htm – Jonathon Faust Oct 02 '12 at 19:51
  • You've got a character encoding issue. I'm guessing that the original text included a smart quote. How did that text get into your Java string? Was it entered in a form in a web page? – Martin Wilson Oct 02 '12 at 19:52
  • I am reading via JAXB an XML file containing HR data, the field in question is email address. – kmansoor Oct 02 '12 at 20:02

2 Answers2

0

I don't know how you got the hex dump, but java strings are unicode strings so there is no 1:1 correspondence between characters and bytes. I suspect your string contains unicode characters which can't be represented by single bytes, and your character handling (which assumes this to be the case) is buggy.

ddyer
  • 1,792
  • 19
  • 26
-1

Did you check if your SQL DB can store UTF-8/Unicode characters (i.e. it's not ISO-8859-1, ASCII or similar)?

  • First output the String to the standard output to see if it's really containing the right character (? is fine in output, but ?? or 0xC2 0x92 = ´ is not).
  • Then check your database character set. See its manual.
TWiStErRob
  • 44,762
  • 26
  • 170
  • 254