0

I have some initial data to load with ./manage.py loaddata command. It's working correctly. The problem is that when I call the loaddata function, I want to load that data only if the data is not yet there, if the data is already loaded then I want to skip this step.

Arayik
  • 114
  • 2
  • 10

2 Answers2

4

my_app/models.py

class Person(models.Model):
    first_name = models.CharField(max_length=200)
    last_name = models.CharField(max_length=200)

Solution 1

Put the attribute pk in your fixtures file. This way, regardless of how many times you call loaddata, it would always just write to the same set of records targeted by the indicated primary keys.

my_app/fixtures.json

[
  {
    "model": "my_app.Person",
    "pk": 1,
    "fields": {
      "first_name": "John",
      "last_name": "Lennon"
    }
  },
  {
    "model": "my_app.Person",
    "pk": 2,
    "fields": {
      "first_name": "Paul",
      "last_name": "McCartney"
    }
  }
]

Solution 2

Wrap around the implementation of the loaddata command.

  1. Read the original JSON file
  2. Filter out the items that already exist in your database
  3. Write the filtered items to a new (temporary) JSON file
  4. Proceed with the original functionality for loaddata
  5. Optionally, delete the temporarily created JSON file

my_app/fixtures.json

[
  {
    "model": "my_app.Person",
    "fields": {
      "first_name": "John",
      "last_name": "Lennon"
    }
  },
  {
    "model": "my_app.Person",
    "fields": {
      "first_name": "Paul",
      "last_name": "McCartney"
    }
  }
]

my_app/management/commands/loaddata.py

import json
import os

from django.core.management.commands import loaddata

from my_app.models import Person


def should_add_record(record):
    if record['model'] != 'my_app.Person':
        return True

    return not Person.objects.filter(
        first_name=record['fields']['first_name'],
        last_name=record['fields']['last_name'],
    ).exists()


class Command(loaddata.Command):
    def handle(self, *args, **options):
        args = list(args)

        # Read the original JSON file
        file_name = args[0]
        with open(file_name) as json_file:
            json_list = json.load(json_file)

        # Filter out records that already exists
        json_list_filtered = list(filter(should_add_record, json_list))
        if not json_list_filtered:
            print("All data are already previously loaded")
            return

        # Write the updated JSON file
        file_dir_and_name, file_ext = os.path.splitext(file_name)
        file_name_temp = f"{file_dir_and_name}_temp{file_ext}"
        with open(file_name_temp, 'w') as json_file_temp:
            json.dump(json_list_filtered, json_file_temp)

        # Pass the request to the actual loaddata (parent functionality)
        args[0] = file_name_temp
        super().handle(*args, **options)

        # You can choose to not delete the file so that you can see what was added to your records
        os.remove(file_name_temp)

Output

Empty database

>>> Person.objects.all()
<QuerySet []>

Trigger loaddata

$ python manage.py loaddata my_app/fixtures.json
Installed 2 object(s) from 1 fixture(s)

Updated database

>>> Person.objects.all()
<QuerySet [<Person: Person object (1)>, <Person: Person object (2)>]>
>>> Person.objects.values()
<QuerySet [{'id': 1, 'first_name': 'John', 'last_name': 'Lennon'}, {'id': 2, 'first_name': 'Paul', 'last_name': 'McCartney'}]>

Rerunning loaddata

$ python manage.py loaddata my_app/fixtures.json
All data are already previously loaded
  • The explicit print would only show if using Solution 2, while the usual default prints would be seen if using Solution 1. Either way, both solutions wouldn't add any new entries to the database.

Database records still stay as is

>>> Person.objects.all()
<QuerySet [<Person: Person object (1)>, <Person: Person object (2)>]>
>>> Person.objects.values()
<QuerySet [{'id': 1, 'first_name': 'John', 'last_name': 'Lennon'}, {'id': 2, 'first_name': 'Paul', 'last_name': 'McCartney'}]>

Related reference:

  • I have more than 8 apps on my project. i dumped their data from an sqlite database. configured postgres on the prohject and i created a superuser account first which i know already exists in the fixture i already dumped. when i tried loading the data, it complained that there are existing records in app_name, app label, can this solution work for such scenario, if yes how can it solve it. Note that i want to load data for all the apps at once without specifying any app name. – Abayomi Olowu Jun 06 '23 at 08:44
2

I would like to add a little bit change to @Niel's answer, since args may contains more than 1 files.

import json
import os

from django.core.management.commands import loaddata
from django.db.models.base import Model
from django.apps import apps


def should_add_record(record):
    arr = record['model'].split('.')
    model_class = apps.get_model(app_label=arr[0], model_name=arr[1])
    return not model_class.objects.filter(
        id=record['pk'],
    ).exists()


class Command(loaddata.Command):
    def handle(self, *args, **options):
        args = list(args)
        for file_name in args:
            # Read the original JSON file
            with open(file_name) as json_file:
                json_list = json.load(json_file)

            # Filter out records that already exists
            json_list_filtered = list(filter(should_add_record, json_list))
            if not json_list_filtered:
                print(f"skip {file_name}")
                continue

            # Write the updated JSON file
            file_dir_and_name, file_ext = os.path.splitext(file_name)
            file_name_temp = f"{file_dir_and_name}_temp{file_ext}"
            with open(file_name_temp, 'w') as json_file_temp:
                json.dump(json_list_filtered, json_file_temp)

            # Pass the request to the actual loaddata (parent functionality)
            # args[0] = file_name_temp
            super().handle(file_name_temp, **options)

            # You can choose to not delete the file so that you can see what was added to your records
            os.remove(file_name_temp)
7284
  • 96
  • 7
  • I have more than 8 apps on my project. i dumped their data from an sqlite database. configured postgres on the prohject and i created a superuser account first which i know already exists in the fixture i already dumped. when i tried loading the data, it complained that there are existing records in app_name, app label, can this solution work for such scenario, if yes how can it solve it. Note that i want to load data for all the apps at once without specifying any app name. – Abayomi Olowu Jun 06 '23 at 08:43