1

I want to insert string 2021-02-12 16:16:43 from Databricks into Snowflake timestamp. So, this is what i tried:

  1. Use to_timestamp function in Databricks to convert from string to timestamp, but this function gives the incorrect timestamp format which Snowflake doesn't recognize.

.withColumn('date_test',to_timestamp("date_test", 'yyyy-MM-dd HH:mm:ss'))

Output format: 2021-02-12T16:16:43.000+0000

Error trace:

Py4JJavaError: An error occurred while calling o5318.save.
: net.snowflake.client.jdbc.SnowflakeSQLException: Timestamp '40' is not recognized
  File '3Xko6AuNme/19.CSV.gz', line 1, character 66
  Row 1, column "TEST_READY_STAGING_862888178"["DATE_TEST":5]
  If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
  1. Then I tried using date_format, which gives correct format but the type is string, and Snowflake complains about it again .withColumn('date_test',date_format(to_timestamp("date_test", 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd HH:mm:ss'))

Error trace:

: net.snowflake.client.jdbc.SnowflakeSQLException: Timestamp '40' is not recognized
  File 'gSOm5eLHFZ/22.CSV.gz', line 1, character 86
  Row 1, column "TEST_READY_STAGING_25342852"["DATE_TEST":5]
  If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option.

3.I tried to convert string to timestamp using udf from this topic: pyspark to_timestamp does not include milliseconds

But it just doesn't convert to timestamp.


import datetime
from pyspark.sql.functions import udf
from pyspark.sql.types import TimestampType

def _to_timestamp(s):
    return datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S')

udf_to_timestamp = udf(_to_timestamp, TimestampType())

df_source_data.select('PURCHASE_DATE_TIME').withColumn("PURCHASE_DATE_TIME", udf_to_timestamp("PURCHASE_DATE_TIME")).show(1,False)

display(df_source_data)
df_source_data.printSchema()

Output:

+------------------------+
|PURCHASE_EVENT_DATE_TIME|
+------------------------+
|2021-02-12 16:16:43     |
+------------------------+
only showing top 1 row
root
 |-- PURCHASE_EVENT_DATE_TIME: string (nullable = true)

Does anybody has any advice on how to push this string from Databricks to timestamp in Snowflake?

alterego
  • 322
  • 1
  • 3
  • 11
  • the first method `to_timestamp` should return a timestamp type... what do you mean by "Snowflake doesn't recognize"? What error did you get? – mck Apr 26 '21 at 08:08
  • to_timestamp returns timestamp type in this format 2021-02-12T16:16:43.000+0000. If I try to inset this value in some test table in snowflake i have error that it is not recognized. Databricks: Py4JJavaError: An error occurred while calling o5318.save. : net.snowflake.client.jdbc.SnowflakeSQLException: Timestamp '40' is not recognized File '3Xko6AuNme/19.CSV.gz', line 1, character 66 Row 1, column "TEST_READY_STAGING_862888178"["DATE_TEST":5] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. – alterego Apr 26 '21 at 08:23

1 Answers1

0

@alterego your format yyyy-MM-dd HH:mm:ss should be yyyy-MM-dd HH:mi:ss

Marioanzas
  • 1,663
  • 2
  • 10
  • 33