-3

I have an XML with the following structure:

<database>
    <name>Test</name>
    <description>A test dataset</description>
    <enrtry_count>10289</entry_count>
    <keywords>some keyword</keywords>
    <url>url.com</url>
    <entries>
    
        <id>muKO</id>
            <name>B6.129S2-Ighm<tm1Cgn>/CgnOrl</name>
            <description>description here</description>
            <date>2022-09-25 21:02:12</date>
            <repository>rep name</repository>
            <full_dataset_link>link_to_dataset.com</full_dataset_link>
        </entry>
.
.
.
.
</database>

Now from this file I need to change <id>muKO</id> to <entry id="muKO">. This is needed in the whole file so I was wandering how to read line by line the XML file and modify the lines that have the structure <id>some_id</id>. Thank you for the help.

Andrea
  • 91
  • 10
  • 2
    Have you looked at https://docs.python.org/3/library/xml.html? – ewokx Jun 23 '23 at 09:39
  • Consider using XSLT for that XML -> XML transformation you basically need a template ``. XSLT 1 support for Python is available with lxml, XSLT 3 support with saxonche. – Martin Honnen Jun 23 '23 at 12:13
  • Note also that the posted input snippet seems to lack a start tag for the end tag `` so it is not quite clear whether you have well-formed XML to start with. – Martin Honnen Jun 23 '23 at 12:15
  • @MartinHonnen The starting tag is missing because is the one that will substitute the line ... – Andrea Jun 27 '23 at 11:46

1 Answers1

0

If you have a valid xml file like my edited example you can set the tag attribute and than remove the id-tag:

import xml.etree.ElementTree as ET
from io import StringIO

xml_str="""<?xml version="1.0" encoding="UTF-8"?>
<database>
    <name>Test</name>
    <description>A test dataset</description>
    <entry_count>10289</entry_count>
    <keywords>some keyword</keywords>
    <url>url.com</url>
    <entries>
        <entry>
        <id>muKO</id>
            <name>B6.129S2-Ighm<tm1Cgn/>CgnOrl</name>
            <description>description here</description>
            <date>2022-09-25 21:02:12</date>
            <repository>rep name</repository>
            <full_dataset_link>link_to_dataset.com</full_dataset_link>
        </entry>
        <entry>
        <id>muK1</id>
            <name>B6.129S2-Ighm<tm1Cgn/>CgnOrl</name>
            <description>description here</description>
            <date>2022-09-25 21:02:12</date>
            <repository>rep name</repository>
            <full_dataset_link>link_to_dataset.com</full_dataset_link>
        </entry>
    </entries>
</database>"""

f = StringIO(xml_str)

tree = ET.parse(f)
root = tree.getroot()

for elem in root.findall('.//entry'):
    print(elem.tag)
    for ed in elem.iter('id'):
        print(ed.text)
        elem.attrib['id'] = ed.text
        elem.remove(ed)

ET.indent(tree, space='  ')
tree.write('ID_xml.xml')
        
ET.dump(root)

Output:

<database>
  <name>Test</name>
  <description>A test dataset</description>
  <entry_count>10289</entry_count>
  <keywords>some keyword</keywords>
  <url>url.com</url>
  <entries>
    <entry id="muKO">
      <name>B6.129S2-Ighm<tm1Cgn />CgnOrl</name>
      <description>description here</description>
      <date>2022-09-25 21:02:12</date>
      <repository>rep name</repository>
      <full_dataset_link>link_to_dataset.com</full_dataset_link>
    </entry>
    <entry id="muK1">
      <name>B6.129S2-Ighm<tm1Cgn />CgnOrl</name>
      <description>description here</description>
      <date>2022-09-25 21:02:12</date>
      <repository>rep name</repository>
      <full_dataset_link>link_to_dataset.com</full_dataset_link>
    </entry>
  </entries>
</database>
Hermann12
  • 1,709
  • 2
  • 5
  • 14