How to set parquet column name dynamically

Question

I want to merge multiple csv files in Azure Synapse Pipeline. And I'll realize it with copy activity, but I am facing with a problem.

There are two types of source files. The one has the header(file type is 'with header') and other doesn't have the header(file type is 'without header') .

I want to set the scheme using the header of 'with header' file. But I don't know how to do.

In my opinion, it could be achieved in the following way, is it possible?

Get list of column name of file 'with header' using 'lookup activity'.

2.Set the list of column names to a variable of type array.

3.Use the variable to mapping in 'copy activity' and merge multiple files.

Can I use a list of column names for mapping?

Waiting for help from you. Any answers would be appreciated.

Thank you.

It is entered in the details of the issue.

both files with header and without header have same schema? – Pratik Lad Jan 09 '23 at 05:04 — Pratik Lad, Jan 09 '23 at 05:04

score 0 · Answer 1 · answered Jan 09 '23 at 09:57

Can I use a list of column names for mapping?

No, you cannot use list of columns in Dynamic mapping there you need to specify the mapping in the json form like below

{
    "source": {
        "name": "Id",
        "type": "String",
        "physicalType": "String"
        },
    "sink": {
        "name": "Id",
        "type": "String",
        "physicalType": "String"
        }
}

I want to set the scheme using the header of 'with header' file. But I don't know how to do.

In your scenario first, you need to segregate files like with header and without header

After that first you need to get the list of files without header using Get Metadata activity.

enter image description here

Then add the header to it and schema to every file using foreach activity and dataflow now pass the output of get metadata to for-each activity.

enter image description here

After this take dataflow activity and add header to file before it create on file with header columns e.g.

enter image description here

Set without header file to the source 1 and don't select First row as header.
Set header file to the source 2 and don't select First row as header.
At SurrogateKey1 activity , enter row as Key column and 2 as Start value.
At SurrogateKey2 activity , enter row as Key column and 1 as Start value.
Then we can union SurrogateKey1 stream and SurrogateKey2 stream at Union1 activity.
Then we can sort these rows by row at Sort1 activity.
Then In sink mapping remove the row column and save it to files with header folder:

successful execution: enter image description here

Now with the files with header you can merge them.

How to set parquet column name dynamically

1 Answers1