0

I have a large XML file and I want to extract some tags and write them in another xml file. I wrote this code:

import xml.etree.cElementTree as CE

tree = CE.ElementTree()
root = CE.Element("root")
i = 0
for event, elem in CE.iterparse('data.xml'):
    if elem.tag == "ActivityRef":
        print(elem.tag)
        a = CE.Element(elem.tag)
        root.append(elem)
        elem.clear()
        i += 1
    if i == 200:
        break

But I don't get the desired result, I got this:

<root>
  <ActivityRef />
  <ActivityRef />
  <ActivityRef />
  <ActivityRef />
  ...
</root>

instead of this:

<root>
  <ActivityRef>
    <Id>2008-12-11T20:43:07Z</Id>
  </ActivityRef>
  <ActivityRef>
    <Id>2008-10-11T20:43:07Z</Id>
  </ActivityRef>
  ...
</root>

Edit

Input file:

<?xml version="1.0" encoding="UTF-8"?>
  <Folders>
    <History>
      <Running>
        <ActivityRef>
          <Id>2009-03-14T17:05:55Z</Id>
        </ActivityRef>
        <ActivityRef>
          <Id>2009-03-13T06:12:42Z</Id>
        </ActivityRef>
        <ActivityRef>
          <Id>2009-03-08T09:00:29Z</Id>
        </ActivityRef>
        <ActivityRef>
          <Id>2009-03-04T19:39:39Z</Id>
        </ActivityRef>
        ...
      </Running>
    </History>
</Folders>

And also I need to remove the element from the source file.

halfer
  • 19,824
  • 17
  • 99
  • 186
Oussama He
  • 555
  • 1
  • 10
  • 31
  • Note that since Python 3.3, there is no need to import `xml.etree.cElementTree`. Just use `xml.etree.ElementTree`: https://docs.python.org/3/whatsnew/3.3.html#xml-etree-elementtree. – mzjn Dec 23 '20 at 15:46

1 Answers1

0

Use XPATH

import xml.etree.ElementTree as ET

data = '''<?xml version="1.0" encoding="UTF-8"?>
  <Folders>
    <History>
      <Running>
        <ActivityRef>
          <Id>2009-03-14T17:05:55Z</Id>
        </ActivityRef>
        <ActivityRef>
          <Id>2009-03-13T06:12:42Z</Id>
        </ActivityRef>
        <ActivityRef>
          <Id>2009-03-08T09:00:29Z</Id>
        </ActivityRef>
        <ActivityRef>
          <Id>2009-03-04T19:39:39Z</Id>
        </ActivityRef>
      </Running>
    </History>
</Folders>'''
root = ET.fromstring(data)
# 'activities' contains the elements you are looking for
activities = root.findall('.//ActivityRef')
balderman
  • 22,927
  • 7
  • 34
  • 52