-1

I'd like to see if anyone can help with this, not even sure if it can be done.

I have a SQL table that looks like this:

GUID                                    Timestamp      Asset      Channel        Value      Units    
B7CADE24-409B-4A79-9D59-0000070825C1    1603100037     SK132      Oil Pressure   178        PSI
A15F0252-2F1A-493E-9B7F-00001615DB47    1607620524     SK133      Oil Pressure   161        PSI
F5DFC208-1F8A-4FC4-94C3-0000168D82B2    1610545779     SK121      Speed          1604       RPM

I'd like to append a Pandas Dataframe to this SQL table but only those rows that don't already exist, if they do exist I'd like to replace with the new value.

The issue is that I would only like to make this comparison based on three columns:

-Timestamp
-Asset
-Channel

If I use:

pd.to_sql("Table", con=engine, if_exists='replace')

It will never be true since GUID is created in code.

Any suggestions?

Thanks in advance

ortunoa
  • 345
  • 4
  • 11

1 Answers1

0

Not sure if possible out of the box because:

replace: Drop the table before inserting new values

  1. You can do it using dropping duplicates:

    # df = your df...
    df2 = pd.read_sql_query("SELECT * FROM Table", con=engine)
    df = pd.concat([df, df2], sort=False)
    df = df.drop_duplicates(['Timestamp', 'Asset', 'Channel'])
    df.to_sql('Table', con=engine, if_exists='replace')
    
  2. Or using temporary table:

     df.to_sql('tmp_table', con=engine, if_exists='replace')
     # remove duplicates from origin table. just an example(I don't know your db)
     engine.execute("""
        DELETE FROM tmp_table WHERE Timestamp || Asset || Channel IN (
           SELECT DISTINCT Timestamp || Asset || Channel FROM Table
     """)
    
     # and rename tmp_table -> Table
    
Danila Ganchar
  • 10,266
  • 13
  • 49
  • 75