I am working on building data pipeline for an API call whose response follows the below format:
<drivers>
<driver id="103721">
<reported_driver_id>5555</reported_driver_id>
<display_ref>5555</display_ref>
<name>Toronto Training 1</name>
</driver>
<drivers>
And there are in total thousands of drivers within the response.
Right now the swagger file I wrote looks like this:
"definitions" : {
"sg_response##drivers##driver" : {
"properties" : {
"id" : {
"type" : "number"
},
"reported_driver_id" : {
"type" : "number"
},
"display_ref" : {
"type" : "number"
},
"name" : {
"type" : "string"
}
}
}
And when I check the result data, I found that I cannot capture the first 'id' column, below is a sample of what I got from my ETL:
"id","reported_driver_id","display_ref","name"
,"5555","5555","Toronto Training 1"
,"6666","6666","Toronto Training 2"
,"6161","6161","Billings Demo 4"
,"169168","169168","Dharminder Grewal"
I really hope that this API can return the result in this format:
<drivers>
<driver>
<id>123<id>
<reported_driver_id>5555</reported_driver_id>
<display_ref>5555</display_ref>
<name>Toronto Training 1</name>
</driver>
<drivers>
But it's not, the id is within the driver object's block. Could you please help me to modify my swagger file to capture the id information? If it is not possible to capture the id information due to the nature of how the API is wrote, please also let me know so I can push the backend team to change the API structure.
Thank you for helping a newbee in data engineering industry :-)