How do you efficiently iterate over a large queryset (records in the millions) with Django?
I'm trying to delete several million records that I can't do with a simple bulk SQL DELETE statement because the transaction would consume too much server memory. So I'm attempting to write a Python script to group ~10000 individual DELETE statements in a transaction.
My script looks like:
from django.db import transaction
from django.conf import settings
settings.DEBUG = False
qs = MyModel.objects.filter(some_criteria=123)
total = qs.count()
i = 0
transaction.set_autocommit(False)
for record in qs.iterator():
i += 1
if i == 1 or not i % 100 or i == total:
print('%i of %i %.02f%%: %s' % (i + 1, total, (i + 1) / float(total) * 100, record))
record.delete()
if not i % 1000:
transaction.commit()
transaction.commit()
This runs fine for the first 2000 records, but then errors with:
Traceback (most recent call last):
File "/project/.env/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1512, in cursor_iter
for rows in iter((lambda: cursor.fetchmany(itersize)), sentinel):
File "/project/.env/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1512, in <lambda>
for rows in iter((lambda: cursor.fetchmany(itersize)), sentinel):
File "/project/.env/lib/python3.7/site-packages/django/db/utils.py", line 96, in inner
return func(*args, **kwargs)
File "/project/.env/lib/python3.7/site-packages/django/db/utils.py", line 89, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/project/.env/lib/python3.7/site-packages/django/db/utils.py", line 96, in inner
return func(*args, **kwargs)
django.db.utils.ProgrammingError: named cursor isn't valid anymore
How do I fix this error? I'm using PostgreSQL as my db backend.
Google finds very few have run into this error, and mentions no solutions specifically for Django. I've read that adding hold=True
to the cursor call my fix it, but it's unclear how to set that through Django's ORM.
I've attempted to add a try/except to catch and continue the query, but that hasn't worked. As the traceback doesn't even include my code, I'm not specifically sure which line is triggering the exception.