I have a table that has around 400k rows and I am trying to bulk update 800 of them into two batches of 400 records each(sql server database). Database table has no triggers or indexes expect the default one on the primary id key.
When I use the bulk_update_mappings from sqlalchemy it takes around 40sec to complete the two updates.
When I write my own raw query which is just a bunch of UPDATES one after another like this:
def bulk_update(data, db_session):
sql = ""
for d in data:
sql += f"""UPDATE table SET <some field updates and parsing to correct db format> WHERE id = {d['id']} \n"""
db_session.exec(text(sql))
db_session.commit()
It takes 0.2sec for the whole thing to complete.
I tried a bunch of other approaches like using the update(model).where(...).values(...) but that was almost as slow as the bulk one.
I tried to profile this and it seems like the bulk_update_mappings is calling the 'executemany' which takes the most time and I didn't see that when I do the raw query. Does anyone have any idea why there is such a big difference between the two methods?
Is it fine to use the raw query or that would introduce a architecture flaw as I now manually have to keep track of fields and parse them in the update statement to the correct format instead of using the model of the tabel from SQLModel directly?