I'm trying to use Snowpark & Python to transform and prep some data ahead of using it for some ML models. I've been able to easily use session.table() to access the data and select(), col(), filter(), and alias() to pick out the data I need. I'm now trying to join data from two different DataFrame objects, but running into an error.
My code to get the data:
import pandas as pd
df1 = read_session.table("<SCHEMA_NAME>.<TABLE_NAME>").select(col("ID"),
col("<col_name1>"),
col("<col_name2>"),
col("<col_name3>")
).filter(col("<col_name2>") == 'A1').show()
df2 = read_session.table("<SCHEMA_NAME>.<TABLE_NAME2>").select(col("ID"),
col("<col_name1>"),
col("<col_name2>"),
col("<col_name3>")
).show()
Code to join:
df_joined = df1.join(df2, ["ID"]).show()
Error: AttributeError: 'NoneType' object has no attribute 'join'
I have also used this method (from the Snowpark Python API documentation) and get the same error:
df_joined = df1.join(df2, df1.col("ID") == df2.col("ID")).select(df1["ID"], "<col_name1>", "<col_name2>").show()
I get similar errors when trying to convert to a DataFrame using pd.DataFrame and then trying to write it back to Snowflake to a new DB and Schema.
What am I doing wrong? Am I misunderstanding what Snowpark can do; isn't it part of the appeal that all these transformations can be easily done with the objects rather than as a full DataFrame? How can I get this to work?