1

I am seeing a very weird behavior in which I can't step into or put breakpoint in some of the ElementTree classes. I started with the below code:

from xml.etree import ElementTree as ET

print(f"{ET.__file__}")
et = ET.parse("/tmp/pom.xml")
print(et)

I got the below output:

/usr/local/Cellar/python@3.9/3.9.1_3/Frameworks/Python.framework/Versions/3.9/lib/python3.9/xml/etree/ElementTree.py
<xml.etree.ElementTree.ElementTree object at 0x10fef9a30>

So, I opened the ElementTree.py file and put a breakpoint into class XMLParser.__init__ (here), but the breakpoint didn't get hit.

class XMLParser:
    ...
    def __init__(self, *, target=None, encoding=None):
        import pdb; pdb.set_trace()
        try:
            from xml.parsers import expat
        except ImportError:

Then I added a breakpoint into the ElementTree.parse (here):

    def parse(self, source, parser=None):
        ...
        close_source = False
        if not hasattr(source, "read"):
            source = open(source, "rb")
            close_source = True
        try:
            if parser is None:
                # If no parser was specified, create a default XMLParser
                import pdb; pdb.set_trace()
                parser = XMLParser()
                if hasattr(parser, '_parse_whole'):

I did get pdb prompt, but when I tried to step into XMLParser, it went straight to the next line. I even ensured that it is referring to the same local class (not some native implementation):

(Pdb) import inspect
(Pdb) inspect.getmodule(XMLParser)
<module 'xml.etree.ElementTree' from '/usr/local/Cellar/python@3.9/3.9.1_3/Frameworks/Python.framework/Versions/3.9/lib/python3.9/xml/etree/ElementTree.py'>

The reason I am doing this is to figure out why overridden _start and _end methods are not getting invoked for my custom class that extends XMLParser. I am instantiating the parser something like this (derived from here, with _start_list changed to _start):

class LineNumberingParser(ET.XMLParser):
    def _start(self, *args, **kwargs):
        element = super()._start(*args, **kwargs)
        element._start_line_number = self.parser.CurrentLineNumber
        print(f"----- {element.tag} {element._start_line_number}")
        return element

    def _end(self, *args, **kwargs):
        element = super()._end(*args, **kwargs)
        element._end_line_number = self.parser.CurrentLineNumber
        print(f"----- {element.tag} {element._end_line_number}")
        return element


parser = LineNumberingParser(target=ET.TreeBuilder(insert_comments=True))
et = ET.parse("/tmp/pom.xml", parser)

I even tried adding a constructor to LineNumberingParser and stepping into the super constructor, but I got the same behavior as before, though I can see that the self instance gets initialized properly (e.g., self.target is None before super.__init__ call but initialized after).

What am I missing here?

Update 1: I put some print statements in very obvious places in XMLParser (like __init__ and _start) and got no output, so it seems like it is using a different implementation though inspect.getmodule says otherwise.

Update 2: I just noticed the below at the end of the module:

# Import the C accelerators
try:
    # Element is going to be shadowed by the C implementation. We need to keep
    # the Python version of it accessible for some "creative" by external code
    # (see tests)
    _Element_Py = Element

    # Element, SubElement, ParseError, TreeBuilder, XMLParser, _set_factories
    from _elementtree import *
    from _elementtree import _set_factories
except ImportError:
    pass
else:
    _set_factories(Comment, ProcessingInstruction)

I guess it indeed was a C native implementation and that is why pdb wasn't stepping in (the answer to my original question). Now I am back to square one to find a solution for line numbers.

Update 3: I found the code used in the test module to skip native module and _start and _end do get called with the python implementations, but there are some significant differences in the write code path.

haridsv
  • 9,065
  • 4
  • 62
  • 65

0 Answers0