0

Imagine a quite complex Django application with both frontend and backend parts. Some users modify some data on the frontend part. Some scripts modify the same data periodically on the backend part.

Example:

instance = SomeModel.objects.get(...)
# (long-running part where various fields are changed, takes from 3 to 20 seconds)
instance.field = 123
instance.another_field = 'abc'
instance.save()

If somebody (or something) changes the instance while that part is changing some fields, then the changes will be lost because the instance will be saved lately, dumping the data from the Python (Django) class. In other words, if something in the code takes data, then waits for some time, and then saves the data back - then only the latest 'saver' will save its data, all the others (previous) ones will lose their changes.

It's a "high-load" app, the database load (we use Postgres) is quite high and I'd like to avoid anything that would cause a significant increase of the DB activity or memory taken.

Another issue - we have many signals attached, and even the save() method overriden, so I'd like to avoid anything that might break the signals or might be incompatible with custom save() or update() methods.

What would you recommend in this situation? Any special app for that? Transactions? Anything else?

Thank you!

Spaceman
  • 1,185
  • 4
  • 17
  • 31

1 Answers1

2

The correct way to protect against this is to use select_for_update to make sure that the data doesn't change between reading and writing. However this causes the row to be locked for updates so this might slow down your application significantly.

Oen solution might be to read the data and perform your long-running tasks. Then before saving it back you start a transaction, read the data again but now with select_for_update and verify that the original data hasn't changed. If the data is still the same then you save. If the data has changed you abort and re-run the long-running task. That way you will hold the lock for as short as possible.

Something like:

success = False
while not success:
  instance1 = SomeModel.objects.get(...)
  # (long-running part)

  with django.db.transaction.atomic():
    instance2 = SomeModel.objects.select_for_update().get(...)
    # (compare relevant data from instance1 vs instance2)
    if unchanged:
      # (make the changes on instance2)
      instance2.field = 123
      instance2.another_field = 'abc'
      instance2.save()
      success = True

If this is a viable approach does depend on what exactly your long-running task is. And a user might still overwrite the data you save here.

Sander Steffann
  • 9,509
  • 35
  • 40
  • thanks that sounds reasonable. But the part "If the data has changed you abort and re-run the long-running task" is a bit scary. I'm sure the data will often be changed. Is there any way to avoid re-running long-running tasks? We'll re-run them, the data will be 'expired', we will re-run them again, the data will be outdated again, and so on, and so on... endless – Spaceman Sep 14 '14 at 12:38
  • Either you lock the data so nobody can change it while you perform the long-running tasks or you don't lock the data and you'll have to deal with the case where the data changes in the mean time. You can't have both... – Sander Steffann Sep 14 '14 at 13:47
  • FYI - I found a nice app https://github.com/zapier/django-stalefields it's a bit outdated but I'll fix that (and probably even will send a pull request =)) – Spaceman Sep 16 '14 at 12:06
  • I am not sure if that app is relevant to your case. It just limits which fields it changes in the database, just like the example code does (It loads the row, checks its values, changes a few fields and then saves it, all while the row is locked so nothing else can change in between). The tricky part is how to check whether the results of your long-running part are still valid to be saved, not the actual saving of them. – Sander Steffann Sep 16 '14 at 13:42