0

I am carrying out XML parsing for a list of XML files. I am using a module which overrides the XMLParser class of element tree. This is the code-

import sys
sys.modules['_elementtree'] = None
try:
    sys.modules.pop('xml.etree.ElementTree')
except KeyError:
    pass
import xml.etree.ElementTree as ET

class Parse():
    def __init__(self):
        self.xmlFiles  = [list_of_xmlFile_paths]

    def parse_xml_files(self):
        for filepath in self.xmlFiles:
            root = ET.parse(filepath, LineNumberingParser()).getroot()
            for elem in root:
                print(elem.start_line_numer, elem.end_line_number)


class LineNumberingParser(ET.XMLParser):
    def _start(self, *args, **kwargs):
        # Here we assume the default XML parser which is expat
        # and copy its element position attributes into output Elements
        self.element = super(self.__class__, self)._start(*args, **kwargs)
        self.element.start_line_number = self.parser.CurrentLineNumber
        self.element.start_column_number = self.parser.CurrentColumnNumber                
        return self.element

    def _end(self, *args, **kwargs):
        self.element = super(self.__class__, self)._end(*args, **kwargs)
        self.element.end_line_number = self.parser.CurrentLineNumber
        self.element.end_column_number = self.parser.CurrentColumnNumber                
        return self.element

The class LineNumberingParser gives me the begin line, end line of an xml node. My issue is that, for every xml file, the class is initialised.So this repetitive initialisation is not efficient. How can I do this by initialising the class only once? Can anyone please suggest.

mzjn
  • 48,958
  • 13
  • 128
  • 248
shweta
  • 107
  • 2
  • 14
  • Which class? `LineNumberingParser` or `Parse`? – sophros Aug 27 '18 at 15:18
  • @sophros I want to initialise LineNumberingParser only once – shweta Aug 27 '18 at 16:09
  • I suspect that it is not possible to do what you want. From the `xml.parsers.expat` documentation: "*Due to limitations in the Expat library used by pyexpat, the xmlparser instance returned can only be used to parse a single XML document*" (https://docs.python.org/3/library/pyexpat.html#xml.parsers.expat.ParserCreate) – mzjn Aug 28 '18 at 09:40

1 Answers1

0

I am still unsure how do you want to do that? It seems that ET.XMLParser class needs to be initialized on per-file basis....

However, should you find a way to go around that (e.g. by "re-initializing" the ET.XMLParser object's variables manually) you could keep an instance of the parser in LineNumberingParser as a class variable and initialize it only once.

sophros
  • 14,672
  • 11
  • 46
  • 75
  • Is it true that ET.XMLParser class needs to be initialised on per file basis? Because I too feel so...and could you please explain the later part of your answer?It would really help – shweta Aug 27 '18 at 17:27