0

I'm trying to model television shows down to the episode level. Given that each level of the tree (network, series, season, episode) has different fields, I want to use different model types for each level.

My initial approach was to keep track of the parent with a foreign key at each level (this is a simplified approach, I know there would be other fields):

class Network(models.Model):
   ...

class Series(models.Model):
   network = models.ForeignKey(Network)
   ...

class Season(models.Model):
   series = models.ForeignKey(Series)
   ...

class Episode(models.Model):
   season = models.ForeignKey(Season)
   ...

But if I want to get the network of a specific episode, I have to lookup Episode->Season->Series->Network. That seems inefficient and poorly architected because it requires a lot of queries.

I saw the library django-mptt, but this requires that your tree be built of a single model type.

From a design standpoint, what's the standard way to structure this type of tree? What are the tradeoffs of different approaches?

John Tacos
  • 167
  • 4
  • Why not just use ForeignKey's `related_name` attribute? You can build one query and follow it back up your tree. Django's documentation: https://docs.djangoproject.com/en/2.1/ref/models/fields/#django.db.models.ForeignKey.related_name – Rachel Feb 06 '19 at 19:29
  • @Rachel Won't I still have to make four separate queries to access that data? I was under the impression that `related_name` made it easy to follow a foreign key in the opposite direction, but it doesn't actually make the data lookup any more efficient. – John Tacos Feb 06 '19 at 19:33
  • @JohnTacos prefetching and joining makes it fast which is what you should do when accessing related entities fields. Accessing the networks fields from an episode is only a single join for the database. So no big deal and totally efficient. – trixn Feb 06 '19 at 19:35
  • Using `related_name` builds the joins for you. It's still one query, not four. – Rachel Feb 06 '19 at 19:38
  • @John Tacos. According to the Django documentation, QuerySets are lazy. What looks like multiple database hits may only be one. This allows you to iteratively build your queryset without hitting the database multiple times. Django's documentation: https://docs.djangoproject.com/en/2.1/topics/db/queries/#querysets-are-lazy – Rachel Feb 06 '19 at 19:43
  • @Rachel thanks very much for the help! I'm glad to see that I don't need to make it complicated. – John Tacos Feb 06 '19 at 19:51
  • @trixn That's awesome. I had no idea prefetching was possible. Thanks so much! – John Tacos Feb 06 '19 at 19:52
  • @JohnTacos I added a more detailed answer. – trixn Feb 06 '19 at 20:11

1 Answers1

0

It is not so inefficient. It requires "only" three joins to get the network for a certain episode.

You can make your life easier if you create a cached_property on your Episode model:

class Network(models.Model):
    name = models.CharField(max_length=255)

# ...

class Episode(models.Model):
    season = models.ForeignKey(Season, on_delete=models.CASCADE)

    @cached_property
    def network(self):
        return self.season.series.network

    @cached_property
    def network_name(self):
        return self.season.series.network.name

This will be expensive to use if you do not annotate that value before accessing it, but it will always work, even if you forget to do that.

The good thing about cached_property is that it can be overridden by setting that attribute on the instance which is exactly what django does when we annotate a value:

episodes = Episode.objects.annotate(network_name=F('season__series__network__name'))
for episode in episodes:
   print(episode.pk, episode.network_name)

By annotating the network name before accessing the episode django will know that is has to join that name. This is how the query looks like:

SELECT 
    "main_episode"."id", 
    "main_episode"."name", 
    "main_episode"."season_id", 
    "main_network"."name" AS "network_name" 
FROM "main_episode" 
INNER JOIN "main_season" ON ("main_episode"."season_id" = "main_season"."id") 
INNER JOIN "main_series" ON ("main_season"."series_id" = "main_series"."id") 
INNER JOIN "main_network" ON ("main_series"."network_id" = "main_network"."id")

You can see that it joined the network onto the episode in advance. So this is one query with three joins. Joins do have a cost but you should not worry about that until you experience performance issues.

Live example

trixn
  • 15,761
  • 2
  • 38
  • 55