0

My ON DUPLICATE UPDATE clause stopped updating and I'm not sure why.

Below is my code to create a temporary table via Pandas:

#connect to mysql database
engine = sqlalchemy.create_engine('mysql://username:@localhost/db?charset=utf8')
conn = engine.connect()

#Create df and write to temp table
df = pd.DataFrame(item_bank,columns=['email','id', 'mbid','artist','track','plays','track_count'])
df.to_sql(con=conn, name='temp', if_exists='replace',index=False)

It successfully creates a MySQL table with all of the data types as 'Text' except for user_tracks which writes as a bigint(20).

I then run this, but the table does not update. It is especially strange to me because I have many scripts that use a similar method, and the only thing I remember changing was to stop updating the other static columns.

mysql_statement = """
    INSERT INTO pickaresk.permanent 
        (email, id, mbid, artist, track, plays, track_count) 
    SELECT * FROM temp
    ON DUPLICATE KEY UPDATE
        plays=temp.plays,
        track_count=temp.track_count,
        lastfm_last_update=NOW() 
    ;
"""

conn.execute(mysql_statement)
conn.close()

The permanent table column's schema that are being updated is shown below. The multiple unique key constraint is the combination of id and email. I also confirmed that there are duplicate keys in both tables

| Field                         | Type         | Null | Key | Default | Extra |
+-------------------------------+--------------+------+-----+---------+-------+
|
| id                            | varchar(255) | NO   |     | NULL    |       |
| email                         | varchar(120) | NO   | MUL | NULL    |       |
| mbid                          | varchar(120) | YES  |     | NULL    |       |
| artist                        | varchar(250) | YES  |     | NULL    |       |
| track                         | varchar(250) | YES  |     | NULL    |       |
| plays                         | float        | YES  |     | NULL    |       |
| track_count                   | int(11)      | YES  |     | NULL    |       |
| lastfm_last_update            | datetime     | YES  |     | NULL    |       |
+-------------------------------+--------------+------+-----+---------+-------+
Jason Melo Hall
  • 652
  • 2
  • 11
  • 23
  • 1
    "on duplicate key update" will update when the insert would violate a unique key. Since you say `(id, email)` is a foreign key (which is not necessarily a unique key), it will not suffice, so add one. We cannot tell you if you have one, as we cannot distinguish it from a unique key with the information in your question, so add the table description if you cannot verify that yourself. – Solarflare Aug 23 '17 at 16:06
  • Thank you for noting this. I checked and it is a unique key. Sorry for the incorrect wordage. I will edit it in the post. – Jason Melo Hall Aug 23 '17 at 16:16
  • What does happen instead? How did you verify the updates don't happen? Also using `INSERT ... SELECT *` is risky, if you are unlucky, the column order is different, and you are inserting e.g. email to id and id to email column, which might also explain why the wrong row might get updates (or inserted). – Solarflare Aug 23 '17 at 16:43
  • @solarflare, I've checked that an id that contains values in the temp table but the permanent table remains null after the script is run. I've also tried explicitly writing the columns in the same order, but it still didn't update so I reverted back to the simpler syntax. – Jason Melo Hall Aug 24 '17 at 15:45
  • @Solarflare, do you think it has something to do with the datatypes being different? It never seemed to matter before, but perhaps there's a new exception I'm not thinking of. – Jason Melo Hall Aug 24 '17 at 15:48
  • It's unfortunately hard to check what happens without seeing the data. If possible, show some errorneous data, or at least describe more precisely how you checked that something went wrong (e.g. what you expected and what you got as a result). Also try `select * from temp t left join permanent p on t.id = p.id and p.email = t.email where p.id is null`. If you get 0 rows, everything went as expected (assuming `id, email` is unique) - although maybe not as expected by you. If you get rows, you have to investigate them (maybe post them in a pseudonymous way, e.g. hidding most parts of the email). – Solarflare Aug 24 '17 at 17:11
  • 1
    Also maybe check `select * from permament order by lastfm_last_update desc`, this should show the recently updated rows first, and check if it is reasonable what you see. (compare it with the corresponding rows in temp). – Solarflare Aug 24 '17 at 17:16
  • @Solarflare, the first suggestion I did and I received a few rows, but I know why they're there. The order by lastfm_last_update idea worked and basically nothing has updated in 2 weeks which makes this issue still stand. I originally checked by pulling some ids I knew should've been updated in both tables. The temp table had a value for plays, but the permanent table remained NULL. Note: I also tried changing all of the columns to be the same type but to no avail. – Jason Melo Hall Aug 26 '17 at 23:57
  • I just added the statement """UPDATE permanent INNER JOIN temp ON permanent.id = temp.id SET permanent.plays=temp.plays, permanent.track_count=temp.track_count, permanent.lastfm_last_update=NOW() ;""" and it worked so I don't have a solution to my question but an inneficient fix. – Jason Melo Hall Aug 27 '17 at 00:16
  • The query either updates or inserts, so the unique key seems uneffective. Please a) post the table definition (`show create table permanent`) b) check `select count(*), id, email from permanent group by id, email having count(*) > 1`. If it returns something, your key isn't working c) check `select count(*), least(id,email), greatest(id, email) from permanent group by least(id,email), greatest(id, email) having count(*) > 1` If it returns something, you (probably) inserted the columns in the wrong order. (I still urge you not to use `*`. Its behaviour is unpredictable and can change tomorrow.) – Solarflare Aug 27 '17 at 07:25

0 Answers0