12

I have a Django system that runs billing for thousands of customers on a regular basis. Here are my models:

class Invoice(models.Model):
    balance = models.DecimalField(
        max_digits=6,
        decimal_places=2,
    )

class Transaction(models.Model):
    amount = models.DecimalField(
        max_digits=6,
        decimal_places=2,
    )
    invoice = models.ForeignKey(
        Invoice,
        on_delete=models.CASCADE,
        related_name='invoices',
        null=False
    )

When billing is run, thousands of invoices with tens of transactions each are created using several nested for loops, which triggers an insert for each created record. I could run bulk_create() on the transactions for each individual invoice, but this still results in thousands of calls to bulk_create().

How would one bulk-create thousands of related models so that the relationship is maintained and the database is used in the most efficient way possible?

Notes:

  • I'm looking for a native Django solution that would work on all databases (with the possible exception of SQLite).
  • My system runs billing in a celery task to decouple long-running code from active requests, but I am still concerned with how long it takes to complete a billing cycle.
  • The solution should assume that other requests or running tasks are also reading from and writing to the tables in question.
Adam
  • 3,668
  • 6
  • 30
  • 55

3 Answers3

13

You could bulk_create all the Invoice objects, refresh them from the db, so that they all have ids, create the Transaction objects for all the invoices and then also save them with bulk_create. All of this can be done inside a single transaction.atomic context.

Also, specifically for django 1.10 and postrgres, look at this answer.

Community
  • 1
  • 1
Ivan
  • 5,803
  • 2
  • 29
  • 46
4

You can do it with two bulk create queries, with following method.

new_invoices = []
new_transactions = []
for loop:
    invoice = Invoice(params)
    new_invoices.append(invoice)

    for loop: 
        transaction = Transaction(params)
        transaction.invoice = invoice
        new_transactions.append(transaction)

Invoice.objects.bulk_create(new_invoices)

for each in new_transactions:
    each.invoice_id = each.invoice.id

Transaction.objects.bulk_create(new_transactions) 
Sagar Adhikari
  • 1,312
  • 1
  • 10
  • 17
0

Another way for this purpose can be like the below code snippet:

from django.utils import timezone
from django.db import transaction

new_invoices = []
new_transactions = []
for sth in sth_else:
    ...
    invoice = Invoice(params)
    new_invoices.append(invoice)

for sth in sth_else:
    ...
    new_transactions.append(transaction)

with transaction.atomic():
    other_invoice_ids = Invoice.objects.values_list('id', flat=True)
    now = timezone.now()
    Invoice.objects.bulk_create(new_invoices)

    new_invoices = Invoice.objects.exclude(id__in=other_invoice_ids).values_list('id', flat=True)
    for invoice_id in new_invoices:
                transaction = Transaction(params, invoice_id=invoice_id)
                new_transactions.append(transaction)

    Transaction.objects.bulk_create(new_transactions)

I write this answer based on this post on another question in the community.

Javad
  • 2,033
  • 3
  • 13
  • 23
  • I know official doc says `save / pre_save / post_save` method won't be called, is it the same case for this answer? I need `save` method called as I customize actions – shawnngtq Nov 26 '22 at 00:42
  • `bulk_create` won't call `save()` method, if you need `save()` method you can create a `bulk_create()` for your manager or a new method do bulk create. – Javad Nov 26 '22 at 09:43