This is my use case:
- I have multiple celery tasks that run in parallel
- Each task could Bulk create or update many objects. For this I'm using django-bulk
So basically I'm using a very convenient function insert_or_update_many:
- it first performs a Select
- if it finds objects it updates them
- Otherwise it creates them
But this introduces problems of concurrency. For example: if an object did not exist during the step 1 then it is added to a list of objects to be inserted later. But during this period can happen that another Celery task has created that object and when it tries to perform a bulk insert (step 3) I get an error of duplicate Entry.
I guess I need to wrap the 3 steps in a 'blocking' block.
I've read around about Transactions and I've tried to wrap the step 1,2,3 within a with transaction.commit_on_success:
block
with transaction.commit_on_success():
cursor.execute(sql, parameters)
existing = set(cursor.fetchall())
if not skip_update:
# Find the objects that need to be updated
update_objects = [o for (o, k) in object_keys if k in existing]
_update_many(model, update_objects, keys=keys, using=using)
# Find the objects that need to be inserted.
insert_objects = [o for (o, k) in object_keys if k not in existing]
# Filter out any duplicates in the insertion
filtered_objects = _filter_objects(con, insert_objects, key_fields)
_insert_many(model, filtered_objects, using=using)
But this does not work for me. I'm not sure I've got a full understanding of the transactions. I basically need a block where I can put several operations being sure no other process or thread is accessing (in write) my db resources.