1

I found (and augmented to deal with datatypes) this very good code to copy dataframes to postgreSQL

engine = create_engine('postgresql+psycopg2://username:password@host:port/database')

# custom function to execute SQL queries
read_sql(query):
    return pd.read_sql_query(query, con=engine)

text_columns = df.dtypes[lambda x: x=='object'].index
numeric_columns = df.dtypes[lambda x: x!='object'].index # you may have different data types!
df[text_columns] = df[text_columns].fillna('')
df[numeric_columns] = df[numeric_columns].astype(str).fillna('').replace({'nan':''})

df = df.rename(columns=str.lower)
df.head(0).to_sql(name='table_name', schema='schema', con=engine, if_exists='replace', index=False) 

conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_expert('COPY schema.table_name FROM STDIN', output)
conn.commit()

# to convert empty cells to NULL
for col in df.columns:
    read_sql(f"""UPDATE 'schema.table_name'
                 SET {col} = NULL
                 WHERE {col} = ''; SELECT 1""")
# to cast the numeric columns back to float
for col in numeric_columns:
    read_sql(f"""UPDATE 'schema.table_name'
                 SET {col} = {col}::float; SELECT 1""")

the only problem is that since I need to set the empty cells to NULL and I need to recast the cast the numerical columns back to numeric and that take time, with many columns! How to improve this code? or does exist a faster solution?

I would like to improve the code that I posted. Data types need to be dealt with in a more efficient way. I tried via adding the different treatment of numeric and text columns, and the code to update the columns in the database table

  • I am not entirely sure whats going on here but if you need to: select pg ➡️ df ➡️ create new table to house df data, then maybe https://www.postgresqltutorial.com/postgresql-select-into/ might be useful. – JL Peyret Dec 16 '22 at 17:07

0 Answers0