I have Django code similar to this:
for obj in some_list:
m1obj = Model1.objects.get(a=obj.a, b=obj.b, c=obj.c)
Model2(m1=m1obj, d=obj.d, e='foo').save()
I did optimize the insert into Model2
using bulk_create
, however, this is still painfully slow because of the get
from Model1
(~45sec for 3k inserts).
I also tried adding:
class Meta:
index_together = [
('a', 'b', 'c'),
]
unique_together = [
('a', 'b', 'c'),
]
The unique_together
helps a little, index_together
didn't seem to have much effect.
I have a cumbersome workaround for this doing:
- Filter
Model1
getting all the objects I will need ordered by by one or more keys, e.g.order_by('a', 'b')
, and make sure Django caches the result, e.g.len()
- Use binary search (
from bisect import bisect_left
) to locate the firsta
thenb
... etc (although there are much fewerb
s andc
s so just iterating is the same.
This reduces the insert time to jus over 3 seconds!
There must be a better, cleaner and maintainable way to do this. Any suggestions? Is there a way to filter/get (smartly) within Django's cache query results?
EDIT: Changed d='foo'
to d=obj.d
- any bulk get needs to be mappable to the tuple it belongs to otherwise I cannot the create Model2 entry.