xml parsing not working correctly

Question

I have an XML file of the structure as follows

<article>
<body>
text1
<collectionlink>
text2
</collectionlink>
text3
</body>
</article>

I used iterparser for parsing. But its not printing the data correctly. I am adding code here.

from xml.etree.ElementTree import iterparse,dump

def main():
    fp=open("sam.xml",'r')
    tree_dict = create_dict_tree_elements(fp)

def create_dict_tree_elements(fp):
    depth=0
    for event,node in iterparse(fp, ['start', 'end', 'start-ns', 'end-ns']):
        if event=='start-ns' or event=='end-ns':
            continue
        if (event == 'start' and depth == 0):
            print node.text
            depth += 1
            continue        

        if (event == 'start' and depth >0 ):
            print node.text
            depth+=1

        if(event =='end' ):
            depth-=1



if __name__ == '__main__':
    main()

My expected output:

text1
text2
text3

Output am getting

text1
text2

node in depth 0 article node in depth greatr 1 body text1 node in depth greatr 1 collectionlink text2 — user3715935, Jun 30 '14 at 05:27

score 0 · Accepted Answer · answered Jun 30 '14 at 07:08

0

In terms of ElementTree node.text is the text between the opening tag and the next tag. The text between the closing tag and the next tag can be found in node.tail.

answered Jun 30 '14 at 07:08

newtover

31,286
11
84
89

how to find parent of node.tail in ElementTree – user3715935 Jul 08 '14 at 07:36
@user3715935, as far as I know an Element in ElementTree does not keep a reference to its parent. You should keep the reference explicitly. – newtover Jul 08 '14 at 10:43

xml parsing not working correctly

1 Answers1