I'm using Django 1.11 with MySQL. Upgrading to 2 isn't feasible in the short term so isn't an acceptable solution to my immediate problem, but answers referring to Django 2 may help others so feel free to post them.
I need to perform a data migration on all rows in a table. There are less than 40000 rows but they are quite big - two of the columns are ~15KB of JSON which get parsed when the model is loaded. (These are the rows I need to use in the data migration so I cannot defer them)
So as not to load all the objects into memory simultaneously, I thought I'd use queryset.iterator which only parses rows 100 at time. This works fine if all I do is read the results, but if I perform another query (eg to save
one of the objects) then once I reach the end of the current chunk of 100 results, the next chunk of 100 results are not fetched and the iterator finishes.
It's as if the result set that fetchmany fetches the rows from has been lost.
To illustrate the scenario using ./manage.py shell
(Assume there exist 40000 MyModel with sequential ids)
iterator = app.models.MyModel.objects.iterator()
for obj in iterator:
print(obj.id)
The above prints the ids 1 to 40000 as expected.
iterator = app.models.MyModel.objects.iterator()
for obj in iterator:
print(obj.id)
obj.save()
The above only prints the ids 1 to 100
iterator = app.models.MyModel.objects.iterator()
for obj in iterator:
print(obj.id)
if obj.id == 101:
obj.save()
The above only prints the ids 1 to 200
Replacing obj.save
with anything else that makes a query to the DB (eg app.models.OtherModel.objects.first()
) has the same result.
Is it simply not possible to make another query while using queryset iterator? Is there another way to achieve the same thing?
Thanks