I have two large XML files(c.100MB) containing a number of items. I want to ouput the difference between them.
Each item has an ID and I need to check if it's in both files. If it is then I need to compare the individual values for that item to make certain it's the same item.
Is a SAX parser the best way to solve this and how is it used? I used element tree and findall which worked on the smaller files, but now I can't for the large files.
srcTree = ElementTree()
srcTree.parse(srcFile)
# finds all the items in both files
srcComponents = (srcTree.find('source')).find('items')
srcItems = srcComponents.findall('item')
dstComponents = (dstTree.find('source')).find('items')
dstItems = dstComponents.findall('item')
# parses the source file to find the values of various fields of each
# item and adds the information to the source set
for item in srcItems:
srcId = item.get('id')
srcList = [srcId]
details = item.find('values')
srcVariables = details.findall('value')
for var in srcVariables:
srcList.append((var.get('name'),var.text))
srcList = tuple(srcList)
srcSet.add(srcList)