I have a .tei
file of the following format.
<biblStruct xml:id="b0">
<analytic>
<title level="a" type="main">The Semantic Web</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0">
<forename type="first">T</forename>
<surname>Berners-Lee</surname>
</persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0">
<forename type="first">J</forename>
<surname>Hendler</surname>
</persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0">
<forename type="first">O</forename>
<surname>Lassilia</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">Scientific American</title>
<imprint>
<date type="published" when="2001-05" />
</imprint>
</monogr>
</biblStruct>
I want to convert the above file to .txt
format which looks like this :
T. Berners-Lee, J. Hendler and O. Lassilia. ‘The Semantic Web’, Scientific American,May 2001
I tried using the following piece of code:
tree = ET.parse(path)
root = tree.getroot()
s = ""
for childs in root:
for child in childs:
s= s+child.text
The problem with the above code is that the loop executes sequentially and the string is not in the sequential format.
Secondly, there might be even more inner loops. Extracting something inside inner loops without manually checking is also problematic. Please help me with this