I have some python experience, but very little knowledge on XML. I need to reformat a 50,000-line XML file where two specific reoccurring tags and their contents need to be turned from multiple lines into one. While keeping the file's current indentation.
Example
<tag1>
<day>1</day>
<month>3</month>
<year>2022</year>
</tag1>
Converted to
<tag1><day>1</day><month>3</month><year>2022</year></tag1>
I am currently trying to use BeautifulSoup4 and thought it would be possible to collapse the tags into a single line by using str(soup) to remove the formatting, but it stays line by line as well as loses the file's indentation. Can this be done with BeautifulSoup4, or should I be looking into something else?
EDIT
The beautifulSoup4 docs made it look like it was possible to remove all formatting using str(soup). I tried that, and it removes all the indentations of each line and keeps everything separate.
with open("test.xml") as fp:
soup = BeautifulSoup(fp, "xml")
print(str(soup))
var = str(soup)
print(var)
f = open("write.xml", "w")
f.write(str(soup))
f.close()
https://stackoverflow.com/a/19396130/10768134 I found this by looking at related posts to the answer D.L provided. This is very close to what I'm looking for. Taking the following input...
<?xml version="1.0" encoding="UTF-8"?>
<randomTag>
<vacation>
<agent>
<ID></ID>
<group></group>
<year></year>
</agent>
<vacation2 type = "word">
<to>
<day>31</day>
<month>12</month>
<year>2022</year>
</to>
</vacation2>
</vacation>>
</randomTag>
And returning the following output
<?xml version='1.0' encoding='UTF-8'?>
<randomTag><vacation><agent><ID/><group/><year/></agent><vacation2 type="word"><to><day>31</day><month>12</month><year>2022</year></to> </vacation2></vacation>></randomTag>
I'm currently looking to see if I am able to change which tags are effected. Hopefully allowing me to remove whitespace only where needed.