0

I'm trying to create an expression from an XML. Reading from top node I want to put the node one by one into a stack, once I hit a closing tag I want to pop all elements in the stack. How do I check the end of a tag ?.

TIA,

John

Answer:

OK, I think I've the solution, using a recursive function like this:

def findTextNodes(nodeList):
    for subnode in nodeList:
        if subnode.nodeType == subnode.ELEMENT_NODE:
            print("element node: ",subnode.tagName)
            # call function again to get children
            findTextNodes(subnode.childNodes)
            print('subnode return: ', subnode.tagName)
        elif subnode.nodeType == subnode.TEXT_NODE:
            print("text node: ",subnode.data)

When the 'subnode return' it hits the closing tag!.

Thanks everybody!.

JohnX
  • 245
  • 3
  • 14

2 Answers2

1

minidom builds the whole DOM in memory. Therefore it will not inform you when a end tag is encountered

1) You can consider swtich to http://docs.python.org/library/pyexpat.html and use the xmlparser.EndElementHandler to watch for the end tag. You will also need to use StartElementHandler to build your stack.

2) Take advantage of the DOM tree that minidom produces: Just select the nodes from it. (without any use of stack)

Anthony Kong
  • 37,791
  • 46
  • 172
  • 304
  • Hey Anthony, 1) unfortunately this is a inherited project I can't change to other module for now. 2) the XML format doesn't have a static format can be any repeated format. – JohnX Apr 03 '12 at 20:12
  • @JohnX If it is the case, you might wanna check this one out: http://stackoverflow.com/questions/1596829/xml-parsing-with-python-and-minidom – Anthony Kong Apr 03 '12 at 20:38
1

minidom builds a DOM. There aren't tags in a DOM, as the XML has been fully parsed into nodes. A node in the DOM represents the entire XML element.

What it sounds like you want are simply the node's children (or children of type ELEMENT_NODE perhaps).

Since you're talking about pushing them onto and popping them off of a stack, it sounds like you want them in the reverse of the order in which they appear in the document. In which case you probably want something like reversed([child for child in node.childNodes if child.nodeType == child.ELEMENT_NODE]).

If you want all children (including the node's children's children and so on) then a recursive solution is simplest.

kindall
  • 178,883
  • 35
  • 278
  • 309
  • Yeah I was thinking it has something like libXml XML_ELEMENT_DECL but it doesn't, anyway I think I've found a solution. Thanks Kindall!. – JohnX Apr 03 '12 at 21:09