1

I have a Django model that is doing way too much. Here's an abbreviated example of the model. Basically, it can represent four different Entity types, and there are recursive ForeignKey and ManyToMany relationships that point to other entities.

This project is currently using Django 1.8.x and Python 2.7.x, but I can upgrade those if the solution requires it.

class Entity(models.Model):
    """
    Films, People, Companies, Terms & Techniques
    """

    class Meta:
        ordering = ['name']
        verbose_name_plural = 'entities'

    # Types:
    FILM = 'FILM'
    PERSON = 'PERS'
    COMPANY = 'COMP'
    TERM = 'TERM'
    TYPE_CHOICES = (
        (FILM, 'Film'),
        (PERSON, 'Person'),
        (COMPANY, 'Company'),
        (TERM, 'Term/Technique'),
    )

    created = models.DateTimeField(auto_now_add=True, auto_now=False)
    updated = models.DateTimeField(auto_now_add=False, auto_now=True)
    type = models.CharField(max_length=4, choices=TYPE_CHOICES, default=FILM)
    slug = models.SlugField(blank=True, unique=True, help_text="Automatically generated")
    name = models.CharField(max_length=256, blank=True)
    redirect = models.ForeignKey('Entity', related_name='entity_redirect', blank=True, null=True, help_text="If this is an alias (see), set Redirect to the primary entry.")
    cross_references = models.ManyToManyField('Entity', related_name='entity_cross_reference', blank=True, help_text="This is a 'see also' — 'see' should be performed with a redirect.")
    [... and more fields, some of them type-specific]

I realize this is rather messy, and I'd like to remove 'type' and make an EntityBase class that abstracts out all of the common fields, and create new Film, Person, Company, and Term models that inherit from the EntityBase abstract base class.

Once I create the new models, I think I understand how to write the data migration to move all of the field data over to the new models (iterate over objects from Entity, filtered via type, create new objects in the appropriate new model)... except the ForeignKey and ManyToMany relationships. Maybe I'm thinking about this the wrong way, but how can I transfer those relationships when, during the migration, the new object that the relationship points to may not exist yet?

I suspect this may mean a multi-step migration, but I haven't quite worked out the right way to do it.

swizzlevixen
  • 325
  • 1
  • 12
  • After thinking about this some more, I realize I have [the same problem as this fellow](http://stackoverflow.com/questions/16310930/django-polymorphic-models-or-one-big-model). I'm wondering if doing what I suggest above, to increase the conceptual clarity of the models, will only cause me more trouble on the relations and views end of things. Either I will now have to query 4+ models to come up with all "related" results, or I'll have to fiddle with something like `django-gm2m`. – swizzlevixen Jul 10 '16 at 18:55
  • Switching to concrete inheritance (instead of abstract) would seem to solve *those* issues, but then there are [dire warnings about using concrete inheritance](http://stackoverflow.com/questions/16310930/django-polymorphic-models-or-one-big-model) from prominent members of the Django community, so… maybe I should just leave it as-is, and clean up my admin code a bit to hide unnecessary fields per-type? – swizzlevixen Jul 10 '16 at 18:57

1 Answers1

1

There is nothing magical about m2m and fk fields. This is the procedure that I would follow... It might be a bit blunt, but will get the job done:

  1. Make a BACKKKUPPPPPPppp of the database!!
  2. Make another backup!
  3. Create the new model and migration
  4. Write a new data migration that will manually iterate over existing models and update the new model, one-by-one. Don't be afraid of the for loop here, unless you have millions of entries in db.
  5. Delete redundant models and/or fields, make migration for this.
  6. Run those migrations :)

In practice, this means a lot of restoring from the "BACKKKUPPPPPPppp" until the migrations are just right.

One little thing to take care of:

M2m fields cannot get any value if model is not yet saved (because model gets its ID on first save). I would do something like, in the manual migration:

new_instance = NewModel()
new_instance.somefield = "whatever"
new_instance.meaning = 42
....
new_instance.save()
new_instance.that_m2m_field.add(some_related_obj)

Of course, make sure you read the docs in detail, especially that bit about importing the model class - you can't just import it from myapp.models import MyModel, instead do:

MyModel = apps.get_model("myapp", "MyModel")

One possible tripping stone might be the model inheritance that you plan to introduce. Generally, you will want to operate on the child model, and access the parent from there as / if needed. Parent can be accessed via the implicit ptr attribute - in your example it would be entitybase_ptr or something similar (that is just a OneToOne field). Going in the other direction, however, (from parent to unknown child) is not as straightforward, because parent doesn't a priori know what is the class of its child.

frnhr
  • 12,354
  • 9
  • 63
  • 90
  • Okay, I see. So basically I have to create all of the new models and migrate the objects, and then, once all of the objects are migrated, iterate back through, using the original model as a reference, so that I can look up which objects were related, and then find the new instance of the objects (by slug, maybe, since that's unique?) and re-add the m2m relations. – swizzlevixen Jul 10 '16 at 18:47
  • @bobtiki Well that's the most general case, yes. In practice that is often an overkill and we can get the same result with just having a duplicate field or two while running data migration, and then deleting the extra fields. – frnhr Jul 10 '16 at 19:15