4

I have the following Python code:

import xml.dom.minidom
import xml.parsers.expat

try:
    domTree = ml.dom.minidom.parse(myXMLFileName)
except xml.parsers.expat.ExpatError, e:
    return e.args[0]

which I am using to parse an XML file. Although it quite happily spots simple XML errors like mismatched tags, it completely ignores the DTD specified at the top of the XML file:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE ServerConfig SYSTEM "ServerConfig.dtd">

so it doesn't notice when mandatory elements are missing, for example. How can I switch on DTD checking?

Charles Anderson
  • 19,321
  • 13
  • 57
  • 73

5 Answers5

5

See this question - the accepted answer is to use lxml validation.

Community
  • 1
  • 1
gimel
  • 83,368
  • 10
  • 76
  • 104
  • Thanks. I'd hoped to avoid having to work outside the standard library, but lxml certainly does the trick. A lot easier to read, too. – Charles Anderson Nov 18 '08 at 15:50
3

Just by way of explanation: Python xml.dom.minidom and xml.sax use the expat parser by default, which is a non-validating parser. It may read the DTD in order to do entity replacement, but it won't validate against the DTD.

gimel and Tim recommend lxml, which is a nicely pythonic binding for the libxml2 and libxslt libraries. It supports validation against a DTD. I've been using lxml, and I like it a lot.

ChuckB
  • 878
  • 5
  • 12
2

Just for the record, this is what my code looks like now:

from lxml import etree

try:
    parser = etree.XMLParser(dtd_validation=True)
    domTree = etree.parse(myXMLFileName, parser=parser)
except etree.XMLSyntaxError, e:
    return e.args[0]
Charles Anderson
  • 19,321
  • 13
  • 57
  • 73
1

I recommend lxml over xmlproc because the PyXML package (containing xmlproc) is not being developed any more; the latest Python version that PyXML can be used with is 2.4.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
0

I believe you need to switch from expat to xmlproc.
See: http://code.activestate.com/recipes/220472/

acrosman
  • 12,814
  • 10
  • 39
  • 55