0

I'm trying to validate a MathML XML string with lxml in this way:

import lxml.etree
mathml = """
<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 3.0//EN" "http://www.w3.org/Math/DTD/mathml3/mathml3.dtd">
<math xmlns="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
  <mi>a</mi>
  <mo>+</mo>
  <mi>b</mi>
</math>
"""
lxml_parser = lxml.etree.XMLParser(
    dtd_validation=True,
    no_network=False,
    load_dtd=True,
    ns_clean=True,
    remove_blank_text=True,
)
validated = lxml.etree.fromstring(xml, lxml_parser)

In this way it checks against the DTD specified in the mathml string and validate the string through the network.

Question

How can I validate the mathml string against a local DTD when there is no nwtwork available?

What I've tried

I've downloaded the MathML DTD 3 from https://www.w3.org/Math/DTD/mathml3/mathml3.dtd and the MathML DTD 1 from https://www.w3.org/Math/DTD/mathml1/mathml.dtd and saved them in the current working dir, I've changed the DOCTYPE declaration to point to the local DTD, e.g <!DOCTYPE math SYSTEM "path/to/mathml_dtd.dtd"> and I've finally created the lxml.etree.XMLParser with no_network=True, but when I run the following code

import lxml.etree
mathml = """
<!DOCTYPE math SYSTEM "path/to/dtd/mathml3.dtd">
<math xmlns="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">
  <mrow>
    <mo>&lfloor;</mo>
    <mrow>
      <mi>a</mi>
    </mrow>
    <mo>&rfloor;</mo>
  </mrow>
</math>
"""
lxml_parser = lxml.etree.XMLParser(
    dtd_validation=True,
    no_network=True,
    load_dtd=True,
    ns_clean=True,
    remove_blank_text=True,
)
validated = lxml.etree.fromstring(xml, lxml_parser)

I've got this error:

File "/projects/py_asciimath/py_asciimath/parser/parser.py", line 117, in __dtd_validation
    return lxml.etree.fromstring(xml, lxml_parser)
  File "src/lxml/etree.pyx", line 3235, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1876, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1757, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
  File "/projects/py_asciimath/py_asciimath/translation/dtd/mathml3.dtd", line 36
lxml.etree.XMLSyntaxError: conditional section INCLUDE or IGNORE keyword expected, line 36, column 17

With a local copy of the MathML DTD 1, instead I've got this:

File "/projects/py_asciimath/py_asciimath/parser/parser.py", line 117, in __dtd_validation
    return lxml.etree.fromstring(xml, lxml_parser)
  File "src/lxml/etree.pyx", line 3235, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1876, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1757, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
  File "<string>", line 1
lxml.etree.XMLSyntaxError: Entity 'lfloor' not defined, line 1, column 135
belerico
  • 31
  • 7
  • Haven't had a chance to test, but the mathml3.dtd references "mathml3-qname.mod"; did you also save that locally in the same directory as the DTD? – Daniel Haley Apr 10 '20 at 13:49
  • 1
    Also consider [using a catalog](https://lxml.de/resolvers.html#xml-catalogs). The [example here](https://stackoverflow.com/a/55616129/317052) uses a schema, but the catalog would be the same for a DTD. – Daniel Haley Apr 10 '20 at 13:52
  • Nope, I don't have it locally. I'll try to get it and I'll test again... – belerico Apr 10 '20 at 15:21
  • I can't use a catalog since I need to develop a simple application and I want to it be self contained – belerico Apr 10 '20 at 15:22
  • Why can't you package the catalog file with the application? – mzjn Apr 11 '20 at 09:40
  • Because I thought that it had to be installed in the operating system. I solved downloading all the needed files to validate from https://www.w3.org/Math/DTD/Overview.html. For the sake of learn I will try create even catalog. Thanks you all! – belerico Apr 11 '20 at 10:51

0 Answers0