Efficiency of aggregate and annotate vs signals

Question

I want to count the number of contributions a user made to my site so I can rank them on the site. I managed to write some beautiful code which does exactly that, but on a per user basis.

Because the user get's different amounts of points for different fields, it checks certain fields on a model and whether or not the user put a value in them. It then multiplies these values with their weights to give a total score.

Nothing says it better then a bit of code:

class UserContribCounter(object):
    """Can count the number of points a user got for his contributions"""
    weight_dict = {'poster':2, 'title':1}

    def __init__(self, user):
        if isinstance(user, User):
            self.user = user
        else:
            raise Exception('Not a valid user instance.')

    def set_contrib_points(self):
        """Some dark magic counts the number of times a certain field was filled out"""
        self.unweighted = Movie.objects.filter(user = self.user).aggregate(poster=Count('poster'),title=Count('title'))

    def get_contrib_points(self):
        """Multiplies the number of times a field was filled out with their weights to calculate the total number of points"""
        try:
            self.unweighted
        except AttributeError:
            self.set_contrib_points()

        return sum([self.weight_dict[key] * value for key, value in self.unweighted.items()])

I also want to show a top 10, so I need to get the top 10 users. This means I will either have to write a complex aggregation which at the moment I keep failing to do, or I could use a signal in the following way:

When a model gets saved, catch the post_save signal. Then use my existing class to recount the points for the user, and store it in the users profile. This way I can sort the users by the value in their profile, which is trivial.

The question is, what will be more efficient, doing a recount every time a model get's saved, or a rather complex aggregation function. I know this will depend on a lot of things, but I am sure that from a conceptional point of view, there should be reasons to choose one over the other. Please note that some of the field I will check in the aggregate will also be relational, so I am not sure how that will effect performance.

Thanks in advance,

tBuLi

score 0 · Answer 1 · answered Aug 20 '12 at 05:15

I would say that this depends on how often your model changes and how accurate and up-to-date your top 10 needs to be. For what it's worth, you can cache the top 10 for an hour or even a day. On the other hand, if you would have to do some complex ordering or processing that is not covered by django's aggregates — you will benefit from denormalization.

And in the end, it all comes down to actually spotting a bottleneck in real world usage. Do the smallest possible thing first, seriously.

Efficiency of aggregate and annotate vs signals

1 Answers1