3

I have an XML file which gives me the following error while parsing: An Invalid XML character(Unicode: 0x0) was found in the element content of the document error.

I can read the whole thing fine, and there are no control characters either.

But when I typed the whole thing myself instead of a file that was provided to me it worked fine.

What could be the issue. I read through some similar questions on SO and they all said that this might be a encoding issue. But can anyone elaborate on this as I could read the whole thing. And if it is encoding issue how would i know by looking at the file because it looks fine. Its readable. I delete a line and type it myself and that line is getting parsed properly.

Thanks in advance

Nick Div
  • 5,338
  • 12
  • 65
  • 127

1 Answers1

3

There are two possible explanations. Either the file contains an instance of the Unicode codepoint 0x0, correctly encoded. XML does not allow this character.

Alternatively, the parser thought it saw an instance of 0x0 because it was decoding the physical bytes of the file incorrectly: that is, the encoding assumed by the XML parser is not the actual encoding of the file.

When you're dealing with this kind of question you need to be very careful about phrases like "I can read the whole file" and "it looks fine". You can't actually see the bits on the disk, you can only use some kind of viewing tool to interpret them for you, and you need to be clear about what tool you were using and how it was configured.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Forgive my language in the question but I was just trying to make it easier to understand, that's all. And I thought that there was a mismatch in parser and the actual encoding. But I was hoping if there was a way that I could detect that more easily. Not by the parser telling me that its not parse-able and then me guessing that it was encoding issue. and re-writing the whole thing – Nick Div Nov 11 '15 at 06:28
  • well, the first step is to look at the file at the binary level, in a hex editor. A quick glance at that will greatly narrow the set of possible causes that need to be investigated. – Michael Kay Nov 11 '15 at 10:10