I want to insert string 2021-02-12 16:16:43
from Databricks into Snowflake timestamp. So, this is what i tried:
- Use
to_timestamp
function in Databricks to convert from string to timestamp, but this function gives the incorrect timestamp format which Snowflake doesn't recognize.
.withColumn('date_test',to_timestamp("date_test", 'yyyy-MM-dd HH:mm:ss'))
Output format: 2021-02-12T16:16:43.000+0000
Error trace:
Py4JJavaError: An error occurred while calling o5318.save.
: net.snowflake.client.jdbc.SnowflakeSQLException: Timestamp '40' is not recognized
File '3Xko6AuNme/19.CSV.gz', line 1, character 66
Row 1, column "TEST_READY_STAGING_862888178"["DATE_TEST":5]
If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
- Then I tried using
date_format
, which gives correct format but the type is string, and Snowflake complains about it again.withColumn('date_test',date_format(to_timestamp("date_test", 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd HH:mm:ss'))
Error trace:
: net.snowflake.client.jdbc.SnowflakeSQLException: Timestamp '40' is not recognized
File 'gSOm5eLHFZ/22.CSV.gz', line 1, character 86
Row 1, column "TEST_READY_STAGING_25342852"["DATE_TEST":5]
If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option.
3.I tried to convert string to timestamp using udf from this topic: pyspark to_timestamp does not include milliseconds
But it just doesn't convert to timestamp.
import datetime
from pyspark.sql.functions import udf
from pyspark.sql.types import TimestampType
def _to_timestamp(s):
return datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S')
udf_to_timestamp = udf(_to_timestamp, TimestampType())
df_source_data.select('PURCHASE_DATE_TIME').withColumn("PURCHASE_DATE_TIME", udf_to_timestamp("PURCHASE_DATE_TIME")).show(1,False)
display(df_source_data)
df_source_data.printSchema()
Output:
+------------------------+
|PURCHASE_EVENT_DATE_TIME|
+------------------------+
|2021-02-12 16:16:43 |
+------------------------+
only showing top 1 row
root
|-- PURCHASE_EVENT_DATE_TIME: string (nullable = true)
Does anybody has any advice on how to push this string from Databricks to timestamp in Snowflake?