What I have so far is putting the text into CDATA tags, and dealing with the possibility of CDATA endings appearing in the text by splitting it into multiple adjacent CDATAs.
I'm not sure about this, but XML parsers can fail to preserve newlines inside of CDATA tags, correct? This would mean escaping them somehow as well...
I want to generate these XML files using Perl, and parse them with C++ (using expat), Java, and C#.
Most importantly, I want the resulting files to be somewhat human-readable/modifiable. Does anyone know of any encoding scheme that fits these needs? I am using this to store data for a database, so it needs to accept arbitrary text, and upon parsing return the exact same text.