Upsert or update SQL records based on some columns using Pandas and SQL Alchemy?

Question

I'd like to see if anyone can help with this, not even sure if it can be done.

I have a SQL table that looks like this:

GUID                                    Timestamp      Asset      Channel        Value      Units    
B7CADE24-409B-4A79-9D59-0000070825C1    1603100037     SK132      Oil Pressure   178        PSI
A15F0252-2F1A-493E-9B7F-00001615DB47    1607620524     SK133      Oil Pressure   161        PSI
F5DFC208-1F8A-4FC4-94C3-0000168D82B2    1610545779     SK121      Speed          1604       RPM

I'd like to append a Pandas Dataframe to this SQL table but only those rows that don't already exist, if they do exist I'd like to replace with the new value.

The issue is that I would only like to make this comparison based on three columns:

-Timestamp
-Asset
-Channel

If I use:

pd.to_sql("Table", con=engine, if_exists='replace')

It will never be true since GUID is created in code.

Any suggestions?

Thanks in advance

score 0 · Answer 1 · answered Mar 11 '21 at 16:56

Not sure if possible out of the box because:

replace: Drop the table before inserting new values

You can do it using dropping duplicates:

# df = your df...
df2 = pd.read_sql_query("SELECT * FROM Table", con=engine)
df = pd.concat([df, df2], sort=False)
df = df.drop_duplicates(['Timestamp', 'Asset', 'Channel'])
df.to_sql('Table', con=engine, if_exists='replace')

Or using temporary table:

 df.to_sql('tmp_table', con=engine, if_exists='replace')
 # remove duplicates from origin table. just an example(I don't know your db)
 engine.execute("""
    DELETE FROM tmp_table WHERE Timestamp || Asset || Channel IN (
       SELECT DISTINCT Timestamp || Asset || Channel FROM Table
 """)

 # and rename tmp_table -> Table

Upsert or update SQL records based on some columns using Pandas and SQL Alchemy?

1 Answers1