0

I have a Django website with a PostgreSQL database. There is a Django app and model for a 'flat' item table with many records being inserted regularly, up to millions of inserts per month. I would like to use these records to automatically populate a star schema of fact and dimension tables (initially also modeled in the Django models.py), in order to efficiently do complex queries on the records, and present data from them on the Django site.

Two main options keep coming up:

1) PostgreSQL Triggers: Configure the database directly to insert the appropriate rows into fact and dimensional tables, based on creation or update of a record, possibly using Python/PL-pgsql and row-level after triggers. Pros: Works with inputs outside Django; might be expected to be more efficient. Cons: Splits business logic to another location; triggering inserts may not be expected by other input sources.

2) Django Signals: Use the Signals feature to do the inserts upon creation or update of a record, with the built-in signal django.db.models.signals.post_save. Pros: easier to build and maintain. Cons: Have to repeat some code or stay inside the Django site/app environment to support new input sources.

Am I correct in thinking that Django's built-in signals are the way to go for maintaining the fact table and the dimension tables? Or is there some other, significant option that is being missed?

Sectio Aurea
  • 393
  • 6
  • 10
  • 1
    The downside of doing this in Python rather than an in-database trigger is that you're stuck always going through your Python code and that can be very slow and cumbersome if you need to do bulk updates and such. Sometimes talking directly to the database is the only sane thing to do. – mu is too short Jan 13 '13 at 02:41
  • Thank you for your speedy input! The fact table will have on the order of a dozen columns, with one or two dozen foreign keyed dimension tables and bridge tables. Updates of 20-200 rows to the flat record table will happen on the order of once or twice every minute. Would this be classified as "bulk" in the context of your comment? – Sectio Aurea Jan 13 '13 at 03:01
  • Just for completeness, this [self-answered question](http://stackoverflow.com/questions/13954499/django-signals-to-update-a-different-model) seems relevant, in terms of a brief example of django-signals being used for database updates. However they do not discuss portability concerns. I agree that supporting other inputs to the "silo" besides the Django app (which uses the Django model for item creation) would be ideal. – Sectio Aurea Jan 13 '13 at 03:18
  • 2
    By "bulk" I mean sending `UPDATE t ...` or `INSERT INTO t ...` SQL straight into the database to avoid the expense of a round trip through Python land. I've found database portability to be largely a myth unless you want to cripple yourself by using only simple bits of SQL, I don't find that approach practical for any of the applications I work on. Allowing other applications (even non-Python ones) to talk to your database without making a mess is also an issue that pushes logic down into the database. – mu is too short Jan 13 '13 at 03:23
  • Thanks for clarifying. I'm definitely looking at bulk operations then, since each insert in the flat table will require many inserts to populate the star schema. And it seems like DRY also compels me to put the logic into database triggers - if I put the logic into Django signals, I'd have to work equivalent events/signals/triggers into whatever other apps feed the records flat table. Feel free to add a quick "PostgreSQL triggers because reasons" summary answer and I'll accept it. – Sectio Aurea Jan 13 '13 at 03:29
  • Here is a very recent blog post, which details required syntax to include (MySQL) triggers in a Django project while working around Django's [antique](https://code.djangoproject.com/ticket/3214) [bugs](https://code.djangoproject.com/ticket/3485) with SQL includes: http://bryanmarty.com/blog/2012/12/03/mysql-triggers-django/ – Sectio Aurea Jan 13 '13 at 04:27
  • 1
    I think you should answer this yourself. I don't know how well you can integrate triggers with Django (I know that it is a bit of a PITA with Rails but then, Rails has an astoundingly bad attitude towards databases), I'm just trying to point out a few issues. – mu is too short Jan 13 '13 at 04:33
  • And [this answer](http://stackoverflow.com/a/4061425/1915685) is probably still relevant. For my case, it still appears that using actual SQL triggers is going to be best for the reasons you pointed out. I'll leave it open a while longer, though, since I have model creation to do before I am committed one way or the other. Thanks for your input. – Sectio Aurea Jan 13 '13 at 04:38

1 Answers1

0

I ended up using Django Signals. With a flat table "item_record" containing fields "item" and "description", the code in models.py looks like this:

from django.db.models.signals import post_save

def create_item_record_history(instance, created, **kwargs):
    if created:
        ItemRecordHistory.objects.create(
            title=instance.title, 
            description=instance.description, 
            created_at=instance.created_at,
            )
post_save.connect(create_item_record_history, sender=ItemRecord)

It is running well for my purposes. Although it's just creating an annotated flat table (new field "created_at"), the same method could be used to build out a star schema.

Sectio Aurea
  • 393
  • 6
  • 10