0

We are generating an xml file in C# using xmlseralizer and UTF8 encoding. We check the output and the xml is well formed and passes XSD validation.

We send this xml to customer who load this in UNIX environment. They keep on telling us that xml is not valid and has invalid characters. We don't have UNIX environment to test.

The question being, is there any difference when loading xml files in UNIX? What can we ask the customer to provide to better understand this situation?

Sharkz
  • 458
  • 1
  • 9
  • 25

1 Answers1

1

You might have a UTF-8 BOM as the first three bytes of your file:

<?xml version="1.0" encoding="utf-8"?>

It is not part of the XML document so a file reader should not pass it on to be interpreted by the XML parser. If you have it, you could try to remove it and see if your users have the same complaint. Most editors will not show it to you so you might have use a hex editor. (Hex: EF BB BF).

If the problem remains, you'd need to know at what byte offset the purported invalid characters are and which section of the XML specification they violate. Which program and version they are use and what feedback it gives might be helpful, too.

You might also consider that the file is getting damaged in delivery. A round trip transmission might help detect that.

Tom Blodget
  • 20,260
  • 3
  • 39
  • 72
  • The following link defines how to remove BOM http://stackoverflow.com/questions/2437666/write-text-files-without-byte-order-mark-bom – Sharkz Mar 21 '14 at 04:49