I'm writing a set of tool in python to extract data from some xml files that are generated by a traffic simulation software. As the resulting files can be quite big I use the xml.parsers.expat to parse them.
The issue is, when I run my scripts at work on a Windows XP machine it work perfectly but at home, on Ubuntu 10.10, on the very same file I get the following error :
ExpatError: not well-formed (invalid token): line 1, column 0
The file was originally encoded in utf-8 and the encoding declared in the tag was ascii so try to change it to utf-8 (or UTF8 or utf8) without success. As the BOM was absent I tryed to write it, still without success. I also tried to replace Windows line break (CR/LF) by Unix ones (CR).Without any success too.
Also the python's version at work is 2.7.1, on my Ubuntu box it's 2.6.6, but don't think my issue is related that : I upgraded my work computer's Python from 2.6 to 2.7 a few weeks ago without trouble.
As I'm not an expert here, I'm running out of idea, any hint ?
Edit: After further investigation (I got an headache now, I hate Unicode related trouble) it look like the issue was solved by setting properly the system environment variable LANG, LC_ALL and LANGUAGE to (in my case) "fr_FR.utf-8". I don't understand why they weren't at first neither why now, it work...
I thank you guys for the hand !