I have multiple XML files that have the same structure as the below sample. There are many App tags and there is always one header tag at the top of each file.
I want to merge multiple XML files with this same structure and remove duplicate Apps by the part ID (ex. 701940). The final output should be one new XML file that has all the merged contents of the XML files with no duplicates parts. In other words, all the apps are unique by the part ID.
I'm not really sure what is the best way to approach this. From my research, I've seen approaches of turning the XML into a dictionary in Python (i.e. xmltodict module), but it gets very complicated and it's not easy for me read the code. What is the simplest way I can solve this with the least amount of code?
I have tried using the Pandas module and using the read_xml and to_xml functions, but they do not retain the data correctly when I display the dataframe. I'm not familiar with the arguments in read_xml so that could be contributing to the problem, but I'm curious if anyone has any better ideas?
<?xml version="1.0" encoding="utf-8"?>
<ACES version="4.2">
<Header>
<Company>x</Company>
<SenderName>y</SenderName>
<SenderPhone>z</SenderPhone>
<TransferDate>a</TransferDate>
<BrandAAIAID>b</BrandAAIAID>
<DocumentTitle>c</DocumentTitle>
<DocFormNumber>2.0</DocFormNumber>
<EffectiveDate>2023-02-22</EffectiveDate>
<SubmissionType>FULL</SubmissionType>
<MapperCompany>d</MapperCompany>
<MapperContact>e</MapperContact>
<MapperPhone>f</MapperPhone>
<MapperEmail>g</MapperEmail>
<VcdbVersionDate>2023-01-26</VcdbVersionDate>
<QdbVersionDate>2023-01-26</QdbVersionDate>
<PcdbVersionDate>2023-01-26</PcdbVersionDate>
</Header>
<App action="A" id="1">
<BaseVehicle id="5911"/>
<BodyType id="5"/>
<EngineBase id="560"/>
<Note>WITHOUT AUTO LEVELING SYSTEM</Note>
<Qty>1</Qty>
<PartType id="7600"/>
<Position id="104"/>
<Part>701940</Part>
</App>