Thanks for reading my question! I use django (3.0.8) with postgres 12. Below is a simplified model for Inventory
. There are about 1M records.
class Inventory(models.Model):
account = models.ForeignKey(Account, on_delete=models.PROTECT)
item_id = models.LargeIntegerField(unique=True, db_index=True)
amount = models.IntegerField()
Every hour we receive new snapshots
of one Account through REST API, e.g. acc_0
. That contains a full list of items (~100k records), not deltas. I wanted to apply 3 actions:
- Set
amount=0 where account==acc_0
ifitem_id
not in the newsnapshots
. - Set
amount=snapshot['amount'] where account==acc_0
ifitem_id
in the newsnapshots
. - Create new items if
snapshot['item_id']
not inInventory.item_id
Each time, most items already exist in DB and their amount is unchanged, i.e. the real delta is quite small (100-1k records).
What I'm doing now seems not very efficient:
with transaction.atomic():
new_items = {}
update_items = {}
Inventory.objects.filters(account=acc_0).update(amount=0)
for snapshot in snapshots:
item_id = snapshot['item_id']
results = Inventory.objects.filter(item_id=item_id)
if len(results) == 0:
new_items[item_id] = Inventory(...)
else:
item = result[0]
item.amount = snapshot['amount']
update_items[item_id] = item
Inventory.objects.bulk_create(new_items.values())
Inventory.objects.bulk_update(update_items.values(), ['amount'])
I was wondering should I upload the snapshot to a temporary table and use UPDATE SET CASE JOIN, INSERT INTO SELECT NOT EXISTS or even better there is a more pythonic way.
There is one similar question: Django Mass Update/Insert Performance but it's also open.