1

I have a database table in Azure DataBricks that already has data in it - I need to append data to that table.

I have my pandas dataframe (df_allfeatures) that I want to append to my database

The function that I use to write to my database table:

def write_to_delta(df_name, db_name, table_name, write_mode, num_part=10):
    df_name \
        .repartition(num_part) \
        .write \
        .mode(write_mode) \
        .insertInto("{}.{}".format(db_name, table_name), overwrite=True)

When using this function to write into by database:

df_allfeatures = spark.createDataFrame(df_allfeatures)
write_to_delta(df_allfeatures, 'production', 'feed_to_output_all_features', 'append', num_part=10)

However I keep getting the error " AnalysisException: Cannot write incompatible data to table 'production.feed_to_output_all_features': "

The columns that are singled out are the following:

"AnalysisException: Cannot write incompatible data to table 'production.feed_to_output_all_features':

  • Cannot safely cast 'LEAD_CONCENTRATE_GRADES_PB': string to double
  • Cannot safely cast 'TAILINGS_RECOVERIES_PB': timestamp to double
  • Cannot safely cast 'DATE': double to timestamp"

I have already changed the datatypes to rectify this error:

df_allfeatures = df_allfeatures.astype({"LEAD_CONCENTRATE_GRADES_PB": 'float32'}) 
df_allfeatures = df_allfeatures.astype({"TAILINGS_RECOVERIES_PB": 'float32'})
df_allfeatures['DATE'] = pd.to_datetime(df_allfeatures['DATE'])

But I keep getting the same error

kdp132
  • 11
  • 1
  • 4

1 Answers1

0

Follow below ways to convert string datatype to double.

Following are some PySpark examples that convert String Type to Double Type

#Using withColumn() examples
df.withColumn("salary",df.salary.cast('double'))
df.withColumn("salary",df.salary.cast(DoubleType()))
df.withColumn("salary",col("salary").cast('double'))

# Rounds it to 2 digits
df.withColumn("salary",round(df.salary.cast(DoubleType()),2))

# Using select
df.select("firstname",col("salary").cast('double').alias("salary"))

# Using select expression
df.selectExpr("firstname","cast(salary as double) salary")

# using SQL to Cast
spark.sql("SELECT firstname,DOUBLE(salary) as salary from CastExample")

For more such examples follow this link - PySpark Convert String Type to Double Type

Abhishek K
  • 3,047
  • 1
  • 6
  • 19