I found (and augmented to deal with datatypes) this very good code to copy dataframes to postgreSQL
engine = create_engine('postgresql+psycopg2://username:password@host:port/database')
# custom function to execute SQL queries
read_sql(query):
return pd.read_sql_query(query, con=engine)
text_columns = df.dtypes[lambda x: x=='object'].index
numeric_columns = df.dtypes[lambda x: x!='object'].index # you may have different data types!
df[text_columns] = df[text_columns].fillna('')
df[numeric_columns] = df[numeric_columns].astype(str).fillna('').replace({'nan':''})
df = df.rename(columns=str.lower)
df.head(0).to_sql(name='table_name', schema='schema', con=engine, if_exists='replace', index=False)
conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_expert('COPY schema.table_name FROM STDIN', output)
conn.commit()
# to convert empty cells to NULL
for col in df.columns:
read_sql(f"""UPDATE 'schema.table_name'
SET {col} = NULL
WHERE {col} = ''; SELECT 1""")
# to cast the numeric columns back to float
for col in numeric_columns:
read_sql(f"""UPDATE 'schema.table_name'
SET {col} = {col}::float; SELECT 1""")
the only problem is that since I need to set the empty cells to NULL and I need to recast the cast the numerical columns back to numeric and that take time, with many columns! How to improve this code? or does exist a faster solution?
I would like to improve the code that I posted. Data types need to be dealt with in a more efficient way. I tried via adding the different treatment of numeric and text columns, and the code to update the columns in the database table