Azure Synapse Copy pipeline for ingesting complex XML

Question

I have a copy pipeline set up that connects to an SFTP server (on Azure Synapse). I have used it to copy csv files and this works fine, but now I have the complex task to pull in a multi-layered xml file and convert it to something usable in the Gen2 data lake (csv or parquet). I have created my copy pipeline and used various settings but each time, I only get the first record through. Has anyone experienced this before or have any thoughts on how did to resolve this?

Emily

Please provide enough code so others can better understand or reproduce the problem. — Community, Apr 15 '22 at 13:08
I am not using code 'per se' to create the pipeline. But the built in Copy pipeline function. I have now tested it by importing a simpler xml file using copy pipeline and still shen I interrogate the csv file that results, it only has one row, when the xml file I have imported has a number of records. — Emily Groves, Apr 19 '22 at 13:02
Some sample XML which represents your problem would be useful, eg one that has two simple records so we can repro only one being imported. I recently got [spark-xml working in a Synapse Notebook](https://stackoverflow.com/a/72092387/1527504) so that might be worth a look as an alternative. Also Azure SQL DB is very capable with XML so that might be another option if you have one in your architecture somewhere. — wBob, May 04 '22 at 13:23

score 0 · Answer 1 · answered May 04 '22 at 12:52

0

you need to specify a collection reference in the mapping section of the pipeline.

Example:

<AllEvents>
  <Event>
    <Count>1</Count>
    <Name>John</Name>
  </Event>
  <Event>
    <Count>2</Count>
    <Name>Jane</Name>
  </Event>
</AllEvents>

Then specify '$['AllEvents']['Event']' in the Collection reference field, click on 'Import Schemas' once again and redo the mapping. When you run the pipeline again it should import all the rows.

answered May 04 '22 at 12:52

Jan-Erik Öhman

3
1

Thank you for replying. I will work through it again, but I had set up a test copy pipeline using Microsoft's books.xml. I want to upload a screen shot here but I can't see how. Anyway, I used the Collection Reference $['catalog']['book'] and then mapped all the fields as you say, but still only one row was ingested. – Emily Groves May 05 '22 at 13:35

Azure Synapse Copy pipeline for ingesting complex XML

1 Answers1