0

I use azure synapse notebooks (pyspark) to process data from blob storage and write it into a SQL server. I'm using the below to write to the SQL table:

df.write \
        .format("jdbc") \
        .option("url", <...>) \
        .option("dbtable", <...>) \
        .option("user", <...>) \
        .option("password", <...>) \
        .mode("append") \
        .save()

This works fine, but you have to match the columns in the dataframe perfectly to those in your SQL table before it works. I'm trying (and failing) to find a way to use a mapping dictionary to pass in, instead of changing all the column names/ordering/etc to match the SQL table.

For example, let's say we have a dataframe and sql table as below:

dataframe: name | job | birthday SQL table: Full Name | Occupation | Date of Birth

I would like to do something like the following:

df_sql_mapping = {
    'name' : 'Full Name',
    'job': 'Occupation',
    'birthday': 'Date of Birth'
}

df.write \
        ...\
        .mapping(df_sql_mapping) \
        .save()

Any ideas? There is very limited documentation available.

Sujith Kumar
  • 872
  • 6
  • 19

1 Answers1

1

Mapping Spark Dataframe Columns to SQL Table Columns in Azure Synapse Notebook

To map the dataframe to sql you can use the following code:

CODE:

  • create mapping dictionary
df_sql_mapping = {
'Id' : 'DataId',
'Name': 'FullName',
}
  • Create DataFrame with new column names matching SQL table.
mapped_df = df.selectExpr([f"{a} as {b}"  for a, b in df_sql_mapping.items()])
mapped_df.show()
  • write new DataFrame to SQL table
mapped_df.write\
.format("jdbc")\
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")\
.option("url", "jdbcURL")\
.option("dbtable", "tablename")\
.option("user", "Username")\
.option("password", "password")\
.mode("append")\
.save()

Output:

enter image description here

SQL table:

enter image description here

Pratik Lad
  • 4,343
  • 2
  • 3
  • 11