0

Im trying to parse a folder with a bunch of xml files. The xml files contains information about some vehicles. The XML files are autogenerated and i some of them has invalid characters. The thing is that, there are too many files for me to correct them manually. So i wonder how i can bypass the invalid character exception? This is the invalid line in some of the xml files:

<ECU EcuName="ABS" EcuFamily="BSS" CplNo="&#01;" Address="0x0B" ConfigChecksum="0x00000000" Updated="false">

I have tried to use Streamreader without any success. This is my code:

 XDocument docs = XDocument.Load(new System.IO.StreamReader((path), Encoding.GetEncoding("utf-8")));                
            var nameValues =
                from fpc in docs.Descendants("FPC")
                select new
                {
                    Name = (string)fpc.Attribute("Name"),
                    Value = (string)fpc.Attribute("Value")
                };
Adnan Hossain
  • 117
  • 2
  • 13
  • What created the files to start with? It would be best to fix that. – Jon Skeet Aug 08 '16 at 11:28
  • The file are being created by different software updaters when the update the vehicle. The xml files contains autogenerated information regarding the vehicle which is generated by another program which i have no control over. @JonSkeet – Adnan Hossain Aug 08 '16 at 11:36
  • That's awkward. You could read each file in and replace `` with an empty string, for example... is that the only invalid character, and the only format in which it occurs? – Jon Skeet Aug 08 '16 at 11:42
  • I know, The thing is that the xml file is a read only file. So i can't change anything in them. @JonSkeet – Adnan Hossain Aug 08 '16 at 11:49

1 Answers1

2

If you need to you can load the file with e.g.

XDocument doc;
using (XmlReader xr = XmlReader.Create(path, new XmlReaderSettings() { CheckCharacters = false }))
{
  doc = XDocument.Load(xr);
}
// now query document here

That will get by character references like the one you have shown, not by disallowed literal characters however.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • How can i allow all characters?? – Adnan Hossain Aug 08 '16 at 11:50
  • You can't, not with XML parsers, as the XML spec defines the allowed characters and a control character other than tab, newline or carriage return is not allowed. Basically your input is not well-formed XML. – Martin Honnen Aug 08 '16 at 11:53