6

Suppose there is a production database, there is some data in it. I need to migrate in the next tricky case.

There is a model (already in db), say Model, it has foreign keys to other models.

class ModelA: ...
class ModelX: ...

class Model:
  a = models.ForeignKey(ModelA, default = A)
  x = models.ForeignKey(ModelX, default = X)

And we need to create one more model ModelY to which Model should refer. And when creating a Model, an object should have some default value related to some ModelY object, which is obviously not yet available, but we should create it during migration.

class ModelY: ...
class Model:
  y = models.ForeignKey (ModelY, default = ??????)

So the migration sequence should be:

  • Create ModelY table
  • Create a default object in this table, put its id somewhere
  • Create a new field y in the Model table, with the default value taken from the previous paragraph

And I'd like to automate all of this, of course. So to avoid necessity to apply one migration by hands, then create some object, then write down it's id and then use this id as default value for new field, and only then apply another migration with this new field.

And I'd also like to do it all in one step, so define both ModelY and a new field y in the old model, generate migration, fix it somehow, and then apply at once and make it work.

Are there any best practices for such case? In particular, where to store this newly created object's id? Some dedicated table in same db?

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
Anton Ovsyannikov
  • 1,010
  • 1
  • 12
  • 30
  • How many records is supposed to be in `ModelY`'s table? Are you sure you need a real model for this? – Ihor Pomaranskyy Jun 03 '19 at 10:23
  • More then one. Yes, it should be model, i.e. it should have ability to be created in django's admin interface. Also I'd like to avoid assumption that newly created id is 1. – Anton Ovsyannikov Jun 03 '19 at 11:36
  • Than maybe this model should have a 'natural' ID instead of surrogate one? Let's suggest that `ModelY` is color, than you can have a field `code = models.CharField(max_lenght=20, primary_key=True)`, and you can have a fixture that adds a record with `code = 'red'`. Regarding FK field, I'd rather keep it nullable and in a signal or overridden `.save()` set the default if the value is null. – Ihor Pomaranskyy Jun 03 '19 at 11:55
  • Yes, it will work somehow, but what about other objects in `ModelY` table, when primary key is char field? Will it be auto-generated? Can we guaranty there will be no conflicts in this case? – Anton Ovsyannikov Jun 03 '19 at 12:18
  • No, it will not be autogenerated, but it is guaranteed that there will be no duplicates, as primary key also adds unique constraint. In other words, you'll have to set the primary key manually, but in case you'll use something meaningful (like color code in the case above) — it should not be an issue. – Ihor Pomaranskyy Jun 04 '19 at 07:27
  • Autogeneration is the must. Seems the solution is not to use primary key, but add another field in `ModelY`, like `name`, and dynamically resolve default value in callable like `ModelY.objects.get(name='Default name')` – Anton Ovsyannikov Jun 04 '19 at 10:20
  • Then you can use the `name` as primary key. :) – Ihor Pomaranskyy Jun 04 '19 at 10:28

3 Answers3

6

You won't be able to do this in a single migration file, however you'll be able to create several migrations files to achieve this. I'll have a go at helping you out though I'm not totally certain this is what you want, it should teach you a thing or two about Django migrations.

I'm going to refer to two types of migrations here, one is a schema migration, and these are the migration files you typically generate after changing your models. The other is a data migration, and these need to be created using the --empty option of the makemigrations command, e.g. python manage.py makemigrations my_app --empty, and are used to move data around, set data on null columns that need to be changed to non-null, etc.

class ModelY(models.Model):
    # Fields ...
    is_default = models.BooleanField(default=False, help_text="Will be specified true by the data migration")

class Model(models.Model):
    # Fields ...
    y = models.ForeignKey(ModelY, null=True, default=None)

You'll notice that y accepts null, we can change this later, for now you can run python manage.py makemigrations to generate the schema migration.

To generate your first data migration run the command python manage.py makemigrations <app_name> --empty. You'll see an empty migration file in your migrations folder. You should add two methods, one that is going to create your default ModelY instance and assign it to your existing Model instances, and another that will be a stub method so Django will let you reverse your migrations later if needed.

from __future__ import unicode_literals

from django.db import migrations


def migrate_model_y(apps, schema_editor):
    """Create a default ModelY instance, and apply this to all our existing models"""
    ModelY = apps.get_model("my_app", "ModelY")
    default_model_y = ModelY.objects.create(something="something", is_default=True)

    Model = apps.get_model("my_app", "Model")
    models = Model.objects.all()
    for model in models:
        model.y = default_model_y
        model.save()


def reverse_migrate_model_y(apps, schema_editor):
    """This is necessary to reverse migrations later, if we need to"""
    return


class Migration(migrations.Migration):

    dependencies = [("my_app", "0100_auto_1092839172498")]

    operations = [
        migrations.RunPython(
            migrate_model_y, reverse_code=reverse_migrate_model_y
        )
    ]

Do not directly import your models to this migration! The models need to be returned through the apps.get_model("my_app", "my_model") method in order to get the Model as it was at this migration's point in time. If in the future you add more fields and run this migration your models fields may not match the databases columns (because the model is from the future, sort of...), and you could receive some errors about missing columns in the database and such. Also be wary of using custom methods on your models/managers in migrations because you won't have access to them from this proxy Model, usually I may duplicate some code to a migration so it always runs the same.

Now we can go back and modify the Model model to ensure y is not null and that it picks up the default ModelY instance in the future:

def get_default_model_y():
    default_model_y = ModelY.objects.filter(is_default=True).first()
    assert default_model_y is not None, "There is no default ModelY to populate with!!!"
    return default_model_y.pk  # We must return the primary key used by the relation, not the instance

class Model(models.Model):
    # Fields ...
    y = models.ForeignKey(ModelY, default=get_default_model_y)

Now you should run python manage.py makemigrations again to create another schema migration.

You shouldn't mix schema migrations and data migrations, because of the way migrations are wrapped in transactions it can cause database errors which will complain about trying to create/alter tables and execute INSERT queries in a transaction.

Finally you can run python manage.py migrate and it should create a default ModelY object, add it to a ForeignKey of your Model, and remove the null to make it like a default ForeignKey.

A. J. Parr
  • 7,731
  • 2
  • 31
  • 46
  • A. J., thanks a lot, this tutorial is very helpful. Now I am playing with your example, what I'm trying to understand is at which point default value is set for `y`. From my understanding there is no way to provide default value in sql schema itself, correct? So `y` is set when Django creates ModelY instance. But `get_default_model_y` is called during migration itself, why!? If there will be no call, the migration can be made of 2 steps, avoiding initial migration with nullable `y`. Also I'd like to thanks for `is_default` solution, it's the best, so we can change default object in admin. – Anton Ovsyannikov Jun 09 '19 at 20:40
  • Actually, I suppose we can skip creation of `y` at first step, so the easier sequence is 1) create `ModelY`, do not touch `Model` 2) create data migration, where we only create `ModelY` object, no need to manage null values at `Model` anymore 3) create `y` for `Model` with callable default. Again not clear why it's called during migration, without it the sequence can be 2 step. What also I afraid of, is how Django manage squashing of such sequences? Does it understand it correctly? – Anton Ovsyannikov Jun 09 '19 at 21:03
  • 1
    Ah, so the problem you may find with using this approach is that the callable default does not work well with migrations because you're running some code from outside the migrations (worth try though). However, I think when it creates that non-null y column it needs a default that's serializable in SQL and the callable default might cause issues at this step. – A. J. Parr Jun 10 '19 at 02:56
  • 1
    Finally, there is a [little creative rework](https://stackoverflow.com/questions/56397090/django-how-to-organize-migration-for-two-related-models-and-automatically-set-d/56961133#56961133) – Anton Ovsyannikov Jul 09 '19 at 22:17
0

Finally I came to the following solution.

First I accept the idea to identify default object by isDefault attribute and wrote some abstract model to deal with it, keeping data integrity as much as possible (code is in bottom of the post).

What I don't like much in accepted solution, is the data migrations are mixed with schema migrations. It's easy to lost them, i.e. during squashing. Occasionally I am also deleting migrations at all, when I am sure all my production and backup databases are in consistence with the code, so I can generate single initial migration and fake it. Keeping data migration together with schema migrations breaks this workflow.

So I decide to keep all data migrations in single file outside of migrations package. So I create data.py in my app package and put all data migrations in single function migratedata, keeping in mind that this function can be called on early stages, when some models still may not exist, so we need to catch LookupError exception for apps registry access. Than I use this function for every RunPython operations in data migrations.

So the workflow looks like that (we assume Model and ModelX are already in place):

1) Create ModelY:

class ModelY(Defaultable):
    y_name = models.CharField(max_length=255, default='ModelY')

2) Generate migration:

manage.py makemigration

3) Add data migration in data.py (add name of the model to defaultable list in my case):

# data.py in myapp
def migratedata(apps, schema_editor):
    defaultables = ['ModelX', 'ModelY']

    for m in defaultables:
        try:
            M = apps.get_model('myapp', m)
            if not M.objects.filter(isDefault=True).exists():
                M.objects.create(isDefault=True)
        except LookupError as e:
            print '[{} : ignoring]'.format(e)

    # owner model, should be after defaults to support squashed migrations over empty database scenario
    Model = apps.get_model('myapp', 'Model')
    if not Model.objects.all().exists():
        Model.objects.create()

4) Edit migration by adding operation RunPython:

from myapp.data import migratedata
class Migration(migrations.Migration):
    ...
    operations = [
        migrations.CreateModel(name='ModelY', ...),
        migrations.RunPython(migratedata, reverse_code=migratedata),
    ]

5) Add ForeignKey(ModelY) to Model:

class Model(models.Model):
    # SET_DEFAULT ensures that there will be no integrity issues, but make sure default object exists
    y = models.ForeignKey(ModelY, default=ModelY.default, on_delete=models.SET_DEFAULT)

6) Generate migration again:

manage.py makemigration

7) Migrate:

manage.py migrate

8) Done!

The whole chain can be applied to empty database, it will create final schema and fill it with initial data.

When we sure, that our db is in sync with code we can easily remove long chain of migrations, generate single initial one, add RunPython(migratedata, ...) to it, and then migrate with --fake-initial (delete django_migrations table before).

Huh, so so tricky solution for such simple task!

Finally there is Defaultable model source code:

class Defaultable(models.Model):
    class Meta:
        abstract = True

    isDefault = models.BooleanField(default=False)

    @classmethod
    def default(cls):
        # type: (Type[Defaultable]) -> Defaultable
        """
        Search for default object in given model.
        Returning None is useful when applying sqashed migrations on empty database,
        the ForeignKey with this default can still be non-nullable, as return value
        is not used during migration if there is no model instance (Django is not pushing
        returned default to the SQL level).

        Take a note on only(), this is kind of dirty hack  to avoide problems during 
        model evolution, as default() can be called in migrations within some 
        historical project state, so ideally we should use model from this historical
        apps registry, but we have no access to it globally. 

        :return: Default object id, or None if no or many.
        """

        try:
            return cls.objects.only('id', 'isDefault').get(isDefault=True).id
        except cls.DoesNotExist:
            return None

    # take care of data integrity
    def save(self, *args, **kwargs):
        super(Defaultable, self).save(*args, **kwargs)
        if self.isDefault:  # Ensure only one default, so make all others non default
            self.__class__.objects.filter(~Q(id=self.id), isDefault=True).update(isDefault=False)
        else:  # Ensure at least one default exists
            if not self.__class__.objects.filter(isDefault=True).exists():
                self.__class__.objects.filter(id=self.id).update(isDefault=True)

    def __init__(self, *args, **kwargs):
        super(Defaultable, self).__init__(*args, **kwargs)

        # noinspection PyShadowingNames,PyUnusedLocal
        def pre_delete_defaultable(instance, **kwargs):
            if instance.isDefault:
                raise IntegrityError, "Can not delete default object {}".format(instance.__class__.__name__)

        pre_delete.connect(pre_delete_defaultable, self.__class__, weak=False, dispatch_uid=self._meta.db_table)
Anton Ovsyannikov
  • 1,010
  • 1
  • 12
  • 30
0

I left my previous answer just to show search for thoughts. Finally I've founded fully automatic solution, so it's not necessary anymore to manually edit django generated migrations, but the price is monkey patching, as often.

The idea is to provide callable for default of ForeignKey, which creates default instance of referenced model, if it is not exists. But the problem is, that this callable can be called not only in final Django project stage, but also during migrations, with old project stages, so it can be called for deleted model on early stages, when the model was still existing.

The standard solution in RunPython operations is to use apps registry from the migration state, but this feature unavailable for our callable, cause this registry is provided as argument for RunPython and not available globally. But to support all scenarios of migration applying and rollback we need to detect are we in migration or not, and access appropriate apps registry.

The only solution is to monkey patch AddField and RemoveField operations to keep migration apps registry in global variable, if we are in migration.

migration_apps = None


def set_migration_apps(apps):
    global migration_apps
    migration_apps = apps


def get_or_create_default(model_name, app_name):
    M = (migration_apps or django.apps.apps).get_model(app_name, model_name)

    try:
        return M.objects.get(isDefault=True).id

    except M.DoesNotExist as e:
        o = M.objects.create(isDefault=True)
        print '{}.{} default object not found, creating default object : OK'.format(model_name, app_name)
        return o


def monkey_patch_fields_operations():
    def patch(klass):

        old_database_forwards = klass.database_forwards
        def database_forwards(self, app_label, schema_editor, from_state, to_state):
            set_migration_apps(to_state.apps)
            old_database_forwards(self, app_label, schema_editor, from_state, to_state)
        klass.database_forwards = database_forwards

        old_database_backwards = klass.database_backwards
        def database_backwards(self, app_label, schema_editor, from_state, to_state):
            set_migration_apps(to_state.apps)
            old_database_backwards(self, app_label, schema_editor, from_state, to_state)
        klass.database_backwards = database_backwards

    patch(django.db.migrations.AddField)
    patch(django.db.migrations.RemoveField)

The rest, including Defaultable model with data integrity check are in GitHub repository

Anton Ovsyannikov
  • 1,010
  • 1
  • 12
  • 30