I'm looking for some help please cleaning up XML files, in python. Below is just a little snippet of code from 50+thousands lines of code. I have many XML files of the same sort of data.
xml = """
<?xml version="1.0" encoding="utf-8"?>
<file>
<SORT_INFO>
<sort_type>sort order</sort_type>
</SORT_INFO>
<ALL_INSTANCES>
<instance>
<ID>1</ID>
<start>0</start>
<end>17.96</end>
<code>14. Jordan Brian Henderson</code>
<label>
<group>Team</group>
<text>Liverpool FC</text>
</label>
<label>
<group>Action</group>
<text>Passes accurate</text>
</label>
<label>
<group>Half</group>
<text>1st half</text>
</label>
<pos_x>52.4</pos_x>
<pos_y>34.0</pos_y>
</instance>
<instance>
<ID>7</ID>
<start>7.96</start>
<end>8.96</end>
<code>Start</code>
</instance>
<instance>
<ID>8</ID>
<start>10.28</start>
<end>30.28</end>
<code>26. Andrew Robertson</code>
<label>
<group>Team</group>
<text>Liverpool FC</text>
</label>
<label>
<group>Action</group>
<text>Passes accurate</text>
</label>
<label>
<group>Half</group>
<text>1st half</text>
</label>
<pos_x>61.7</pos_x>
<pos_y>68.0</pos_y>
</instance>
<instance>
<ID>1321</ID>
<start>3770.67</start>
<end>3790.67</end>
<code>3. Fabinho</code>
<label>
<group>Team</group>
<text>Liverpool FC</text>
</label>
<label>
<group>Action</group>
<text>Passes accurate</text>
</label>
<label>
<group>Half</group>
<text>2nd half</text>
</label>
<pos_x>62.7</pos_x>
<pos_y>3.7</pos_y>
</instance>
<instance>
<ID>1882</ID>
<start>5695.17</start>
<end>5715.17</end>
<code>2. Fabio Cardoso</code>
<label>
<group>Team</group>
<text>Porto</text>
</label>
<label>
<group>Action</group>
<text>Interceptions</text>
</label>
<label>
<group>Half</group>
<text>2nd half</text>
</label>
<pos_x>8.1</pos_x>
<pos_y>46.3</pos_y>
</instance>
</ALL_INSTANCES>
<ROWS>
<row>
<code>20. Vitinha</code>
<sort_order>15</sort_order>
<R>51400</R>
<G>51400</G>
<B>51400</B>
</row>
<row>
<code>11. Pepe</code>
<sort_order>16</sort_order>
<R>51400</R>
<G>51400</G>
<B>51400</B>
</row>
</ROWS>
</file>
"""
I'd like to remove everything before <ALL_INSTANCES>
and everything after </ALL_INSTANCES>
I'd also like to remove any of the instance
tags that include <code>Start</code>
Would it be possible to do this for all XML's in a folder?
Thanks