0

I have a copy pipeline set up that connects to an SFTP server (on Azure Synapse). I have used it to copy csv files and this works fine, but now I have the complex task to pull in a multi-layered xml file and convert it to something usable in the Gen2 data lake (csv or parquet). I have created my copy pipeline and used various settings but each time, I only get the first record through. Has anyone experienced this before or have any thoughts on how did to resolve this?

Emily

CHEEKATLAPRADEEP
  • 12,191
  • 1
  • 19
  • 42
  • Please provide enough code so others can better understand or reproduce the problem. – Community Apr 15 '22 at 13:08
  • I am not using code 'per se' to create the pipeline. But the built in Copy pipeline function. I have now tested it by importing a simpler xml file using copy pipeline and still shen I interrogate the csv file that results, it only has one row, when the xml file I have imported has a number of records. – Emily Groves Apr 19 '22 at 13:02
  • I can provide a walk through to recreate the problem: – Emily Groves Apr 20 '22 at 13:58
  • Some sample XML which represents your problem would be useful, eg one that has two simple records so we can repro only one being imported. I recently got [spark-xml working in a Synapse Notebook](https://stackoverflow.com/a/72092387/1527504) so that might be worth a look as an alternative. Also Azure SQL DB is very capable with XML so that might be another option if you have one in your architecture somewhere. – wBob May 04 '22 at 13:23
  • How did you get on with this? – wBob Jun 02 '22 at 12:16

1 Answers1

0

you need to specify a collection reference in the mapping section of the pipeline.

Example:

<AllEvents>
  <Event>
    <Count>1</Count>
    <Name>John</Name>
  </Event>
  <Event>
    <Count>2</Count>
    <Name>Jane</Name>
  </Event>
</AllEvents>

Then specify '$['AllEvents']['Event']' in the Collection reference field, click on 'Import Schemas' once again and redo the mapping. When you run the pipeline again it should import all the rows.

  • Thank you for replying. I will work through it again, but I had set up a test copy pipeline using Microsoft's books.xml. I want to upload a screen shot here but I can't see how. Anyway, I used the Collection Reference $['catalog']['book'] and then mapped all the fields as you say, but still only one row was ingested. – Emily Groves May 05 '22 at 13:35