2

I'm trying to upload a pandas DataFrame directly to Redshift using the to_sql function.

connstr = 'redshift+psycopg2://%s:%s@%s.redshift.amazonaws.com:%s/%s' % 
          (username, password, cluster, port, db_name)

def send_data(df, block_size=10000):
    engine = create_engine(connstr)
    with engine.connect() as conn, conn.begin():
        df.to_sql(name='my_table_clean', schema='my_schema', con=conn, index=False, 
                  if_exists='replace', chunksize=block_size)
    del engine

The table my_schema.my_table_clean exists (but is empty), and the connection built using connstr is also valid (verified by a correspond retrieve_data method). The retrieve function pulls data from my_table and my script cleans it up using pandas to output to my_table_clean.

The problem is, I keep getting the following error:

TypeError: _get_column_info() takes exactly 9 arguments (8 given)

during the to_sql function.

I can't seem to figure out what is causing this error. Is anyone familiar with it?

Using

  • python 2.7.13
  • pandas 0.20.2
  • sqlalchemy 1.2.0.

Note: I'm trying to circumvent S3 -> Redshift for this script since I don't want to create a folder in my bucket just for one file, and this single script doesn't conform to my overall ETL structure. I'm hoping to just run this one script after the ETL that creates the original my_table.

Bill
  • 698
  • 1
  • 5
  • 22

0 Answers0