2

I am trying to upload a dataframe into an existing table in snowflake cloud. Here is the dataframe:

columns_df.head()

enter image description here

Now when using the to_sql() from pandas to append data into existing table:

columns_df.to_sql('survey_metadata_column_names', index=False,  index_label=None, con=conn, schema='PUBLIC', if_exists='append', chunksize=300)

I received the following error:

DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting

TypeError: not all arguments converted during string formatting

Some of the column names contains dashes and underscores.

alim1990
  • 4,656
  • 12
  • 67
  • 130
  • This should help. https://stackoverflow.com/questions/63675368/databaseerror-not-all-arguments-converted-during-string-formatting-when-i-us – Abhishek Rai Nov 24 '20 at 14:24

2 Answers2

3

From the snowflake documentation.

To write data from a Pandas DataFrame to a Snowflake database, do one of the following:

  • Call the write_pandas() function.
  • Call the pandas.DataFrame.to_sql() method, and specify pd_writer as the method to use to insert the data into the database.

Note the highlighted text in the second bullet. I have noticed several issues using both methods, but they are the official solutions.

from snowflake.connector.pandas_tools import pd_writer
columns_df.to_sql('survey_metadata_column_names', 
                 index = False,  
                 index_label = None, 
                 con = Engine, #Engine should be an SQLAlchemy engine 
                 schema = 'PUBLIC', 
                 if_exists = 'append', 
                 chunksize = 300,
                 method = pd_writer)

Or alternatively

from snowflake.connector.pandas_tools import write_pandas
con = snowflake.connector.connect(...)
success, nchunks, nrows, _ = write_pandas(con, 
                                          columns_df, 
                                          'survey_metadata_column_names', 
                                          chunk_size = 300, 
                                          schema = 'PUBLIC')

Note that the first method requires an SQLAlchemy engine while the second one can use a regular connection.

Oliver
  • 8,169
  • 3
  • 15
  • 37
  • 1
    You could test this using a big df into a temp table. `con.execute('create temporary table ...')` and `timeit`. I'd expect them to be almost equivalent, and that they use the same method of splitting the `DataFrame`, saving it to disk in a temporary file, uploading the file into temporary storage using `PUT @file` and then using `COPY INTO`. If this is the case there might be a slight performance gain from `write_pandas`, as the documentation hints that `pd_writer` does not have a chunksize argument (`pandas` may be splitting up the frames before calling the function). – Oliver Nov 24 '20 at 15:18
  • From the second method I am getting this error: `The following error occured: 090106 (22000): Cannot perform CREATE STAGE. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.` – alim1990 Nov 24 '20 at 15:34
  • I'd suggest adding schema to your connection upon opening it, if this is your error. `.connect(..., schema = ..., database = ..., warehouse = ...)`. I can't say specifically what is going wrong in your case, without knowing the exact code. Also if you have access to the snowflake help hub, then asking questions there will ensure any errors get looked into (at some point in maybe the distant future). We had more specific needs than we could obtain using their builtin functions, so we coded our own version of `write_pandas` internally (sadly I cannot give a link). – Oliver Nov 24 '20 at 15:46
  • ok now having this error: `Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. pyarrow or fastparquet is required for parquet support` – alim1990 Nov 24 '20 at 15:51
  • tried to install it but there is missing files (huge list of missing dependencies) – alim1990 Nov 24 '20 at 15:52
1

Have a look at the solution I posted here that allows for write (create and replace) and appends): write_pandas snowflake connector function is not able to operate on table

June
  • 720
  • 10
  • 22