0

We are parsing xml file that contains user informations such as name, age etc. But the users are from all over the world and we need different char-set in xml. For example, there is user with name "Sikl¢si" . If I set xml encoding UTF-8, c# xmldocument object throws an exception on xml load. I changed the encoding to iso-8859-9, it is working now. But, if we have another funny chars that is not covered by iso-8859-9, it will be problem again. What is the ultimate solution for this problem.

londondev
  • 231
  • 2
  • 13

2 Answers2

3

The ultimate solution is to know what encoding was used to encode the file in the first place. An XML file should state what encoding it is using in the XML declaration (e.g. <?xml charset="UTF-8" ?>). If it doesn't, then the document should be UTF-8 or UTF-16 (and the difference between them can be detected automatically).

Your XML parser should handle the encoding transparently based on the information in the XML file.

If you are receiving documents that won't parse, then odds are that the problem is in how they are being generated in the first place. You should reject them and tell the submitted to fix the encoding.

(Note that any Unicode encoding can handle just about any character you are likely to need (as well as a vast number you aren't). The problem is that the document isn't UTF-8, not that UTF-8 can't handle the characters being used).

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
  • Actually, we are generating xml files from database.So, there is no information about charset that require for the user. I do not know what to do. – londondev Feb 08 '12 at 12:09
  • 1
    Convert all the data in the database to UTF-8, and make sure that everything that modifies it sticks to UTF-8 too. You'll probably have to do a lot of manual checking as you try to recover from the current broken state. – Quentin Feb 08 '12 at 13:32
-1

Make it UTF-32 which will cover most of them. For more info on UTF visit this.

Mujtaba Hassan
  • 2,495
  • 2
  • 20
  • 29
  • Nope, it is not working. XmlDocument doesn't know UTF-32 I think. It gives error message in root level: Data at the root level is invalid. Line 1, position 40 – londondev Feb 08 '12 at 12:24
  • 1
    UTF-32 will cover all possible characters you might want, but so will UTF-16 and UTF-8. There is no reason why UTF-32 would work and UTF-8 wouldn't. – svick Feb 08 '12 at 14:04