0

I have some python experience, but very little knowledge on XML. I need to reformat a 50,000-line XML file where two specific reoccurring tags and their contents need to be turned from multiple lines into one. While keeping the file's current indentation.

Example

<tag1>
      <day>1</day>
      <month>3</month>
      <year>2022</year>
</tag1>

Converted to

<tag1><day>1</day><month>3</month><year>2022</year></tag1>

I am currently trying to use BeautifulSoup4 and thought it would be possible to collapse the tags into a single line by using str(soup) to remove the formatting, but it stays line by line as well as loses the file's indentation. Can this be done with BeautifulSoup4, or should I be looking into something else?

EDIT

The beautifulSoup4 docs made it look like it was possible to remove all formatting using str(soup). I tried that, and it removes all the indentations of each line and keeps everything separate.

with open("test.xml") as fp:
    soup = BeautifulSoup(fp, "xml")
print(str(soup))
var = str(soup)
print(var)

f = open("write.xml", "w")
f.write(str(soup))
f.close()

https://stackoverflow.com/a/19396130/10768134 I found this by looking at related posts to the answer D.L provided. This is very close to what I'm looking for. Taking the following input...

<?xml version="1.0" encoding="UTF-8"?>
<randomTag>
  <vacation>
    <agent>
        <ID></ID>
        <group></group>  
        <year></year>
    </agent>
    <vacation2 type = "word">
        <to>
            <day>31</day>
            <month>12</month>
            <year>2022</year>
        </to>
    </vacation2>
  </vacation>> 
</randomTag>

And returning the following output

<?xml version='1.0' encoding='UTF-8'?>
<randomTag><vacation><agent><ID/><group/><year/></agent><vacation2 type="word"><to><day>31</day><month>12</month><year>2022</year></to> </vacation2></vacation>&gt;</randomTag>

I'm currently looking to see if I am able to change which tags are effected. Hopefully allowing me to remove whitespace only where needed.

Ken White
  • 123,280
  • 14
  • 225
  • 444
Erozim
  • 1
  • 2
  • 1
    https://stackoverflow.com/a/3317008/2834978 – LMC Sep 30 '22 at 17:36
  • https://stackoverflow.com/questions/2148119/how-to-convert-an-xml-string-to-a-dictionary – D.L Sep 30 '22 at 18:11
  • what have you tried so far ? code ? what is the error ? https://stackoverflow.com/help/minimal-reproducible-example – D.L Sep 30 '22 at 18:12

1 Answers1

0
import  xml.etree.ElementTree as ET
tree = ET.parse("test.xml")
root = tree.getroot()
for elem in root.iter(tag="*"):
    if elem.tag == "opening parent tag":              
        elem.text = elem.text.strip()
    if elem.tag == "opening parent tag 2":
        elem.text = elem.text.strip()
    if elem.tag == "child tag":
        elem.tail = ""
    if elem.tag == "child tag 2":
        elem.tail = ""
    if elem.tag == "child tag 3":
        elem.tail = ""

I used this code to bring the opening and closing tag to the same line, with children tags also being moved to the same line with their value remaining. This is done by removing any text after the opening tag and removing text after all child tags. Removing the space after the last child tag brings the closing parent tag to the same line as the opening parent tag. Don't remove the space after the closing parent tag as this brings the next opening tag up.

Erozim
  • 1
  • 2