I am getting the below error while trying to write to Synapse Dedicated SQL Pool, the dataframe and the target table has same schema, however it seems that the connector can not find some columns in the dataframe while matching the target columns.
Py4JJavaError: An error occurred while calling o2428.synapsesqlforpython.
: com.microsoft.spark.sqlanalytics.SQLAnalyticsConnectorException: Schema from source does not match target table schema.
Match criteria: {Case-sensitive Column Names, DataType and Nullability}.
Non matching fields from source:
[{Equipment_Cycle_Number, LongType, false},{Equipment_ID, LongType, false},{Equipment_Cycle_Activity_ID, LongType, false},{Equipment_Location_UTM_X, DoubleType, true},{Equipment_Location_UTM_Y, DoubleType, true},{Equipment_Location_UTM_Z, DoubleType, true}]
Fields on target:
StructType(StructField(Equipment_Cycle_Number,LongType,false), StructField(Activity_Start_Time,TimestampType,false), StructField(Activity_End_Time,TimestampType,false), StructField(Activity_Duration,DoubleType,true), StructField(Equipment_ID,LongType,false), StructField(Equipment_Operator_ID,LongType,true), StructField(Equipment_Cycle_Activity_ID,LongType,false), StructField(Equipment_Travel_Point_DateTime,TimestampType,true), StructField(Equipment_Location_UTM_X,DoubleType,true), StructField(Equipment_Location_UTM_Y,DoubleType,true), StructField(Equipment_Location_UTM_Z,DoubleType,true), StructField(Created_Date,TimestampType,true), StructField(Last_Modified_Date,TimestampType,true), StructField(Activity_Date,TimestampType,true))
at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.sqlanalytics(SqlAnalyticsConnectorClass.scala:347)
at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.synapsesql(SqlAnalyticsConnectorClass.scala:191)
at com.microsoft.spark.sqlanalytics.SqlAnalyticsConnectorClass$SQLAnalyticsFormatWriter.synapsesqlforpython(SqlAnalyticsConnectorClass.scala:203)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException: Schema from source does not match target table schema.
Pyspark Synapse write Code:
df_activity_staging.write\
.option(Constants.SERVER, synapseURL)\
.option(Constants.USER, synapseUser)\
.option(Constants.PASSWORD, synapsePwd)\
.option(Constants.TEMP_FOLDER, "abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<folder>")\
.option(Constants.STAGING_STORAGE_ACCOUNT_KEY, synapse_storage_account_access_key)\
.mode("overwrite")\
.synapsesql("DB_Name.Schema.Table")
Table Schema
CREATE TABLE [schema].[Table]
(
[Equipment_Cycle_Number] [bigint] NOT NULL,
[Activity_Start_Time] [datetime] NOT NULL,
[Activity_End_Time] [datetime] NOT NULL,
[Activity_Duration] [float] NULL,
[Equipment_ID] [bigint] NOT NULL,
[Equipment_Operator_ID] [bigint] NULL,
[Equipment_Cycle_Activity_ID] [bigint] NOT NULL,
[Equipment_Travel_Point_DateTime] [datetime] NULL,
[Equipment_Location_UTM_X] [float] NULL,
[Equipment_Location_UTM_Y] [float] NULL,
[Equipment_Location_UTM_Z] [float] NULL,
[Created_Date] [datetime] NULL,
[Last_Modified_Date] [datetime] NULL,
[Activity_Date] [date] NULL
)
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
)
GO
I tried using jdbc connector, it works but the writing is very very slow, takes more than an hours writing 100k record
df_4Write.write.jdbc(url=synDSPUrl, table="Schema.Table", mode="overwrite", properties=synDSPConnectionProperties).save()
Use of apache spark connector with staging table is supposed to work smoother, but it seems broken.