0

Snowpark has a problem / bug that it does not maintain the types between Pandas and Snowpark, nor does it allow to manually set its schema.

For instance,

df1 = session.sql(sql).to_pandas()
df2 = session.create_dataframe(df)

The timestamp field on df1 with TimestampType has become a LongType.

I've also tried to store the schema and use it, but same results.

df1 = session.sql(sql)
df1_schema = df1.schema
df1 = df1.to_pandas()
df2 = session.create_dataframe(df, df1_schema)

Has anyone managed to deal with it? This stops me from being able to write the DataFrame back to the table as it needs to be of TimestampType rather than LongType.

elongl
  • 13
  • 5

2 Answers2

0

Tried to recreate this in snowpark, it seems that the TimestampType is internally getting changed to LongType when pandas df is converted to snowpark df using create_dataframe() method.

Also, specifying schema parameter in create_dataframe() method w.r.t this scenario is not making any difference. So, one way is to explicitly change the column to timestamp using to_timestamp() method.

from snowflake.snowpark.functions import sql_expr
df1 = session.sql("select * from timestamp_test")
df1 = df1.to_pandas()
df2 = session.create_dataframe(df1)
colCast = df2.withColumn("T", sql_expr("to_timestamp(T::string)"))
colCast.show()
PooMac
  • 41
  • 2
0

I've had some success with this:

df1 = session.sql("select * from timestamp_test")
df1 = df1.to_pandas()
df1["T"] = pd.to_datetime(df1["T"])
df1["T"] = df1["T"].dt.strftime("%Y-%m-%d %H:%M:%S.%f")
df2 = session.create_dataframe(df1)
df2.write.mode("overwrite").save_as_table("table_name")

A second option that's more general. I saw it in an article somewhere and I didn't understand why they did it, until I saw some problems with the data types in Snowflake. To use it you need to build the table schema first:

# Create an empty table using the schema
df = session.create_dataframe([len(table_schema.names) * [None]], schema=table_schema)
df.na.drop().write.mode("overwrite").save_as_table(tbl)

# Populate the table
df = session.create_dataframe(data=df1, schema=table_schema)
df_snw.write.mode("append").save_as_table(tbl)
rick
  • 1
  • 1