1

basically i want to get the latest 30 log entries for every Host. Currently i'm doing this in django-piston.

def read(self,request):
    val={}
    for x in Host.objects.all():
        val[x.uuid_id]=DataLog.objects.filter(host=x).order_by('-time')[:30]           
    return val

Unfortunately this request takes way to long (currently 1s for about 10k database entries). Are there more efficient ways to do this?

Harper

Harper04
  • 355
  • 2
  • 10
  • This is a common problem with no single query solution short of raw SQL. Look up "Top N GROUP BY" to find help with the SQL. When I needed the top N questions per category for my FAQ, I first wrote raw SQL, then after a code change decided to scrap the idea and just cache the result :) – Yuji 'Tomita' Tomita May 21 '12 at 20:29
  • 1
    How about keeping the last 30 rows for each host in a different table (e.g. LatestDataLog)? Using a signal when a datalog is created, save the datalog id in LatestDataLog & purge records for the host beyond 30 rows. – dannyroa May 21 '12 at 23:35
  • I think your title as badly choosen. I wuold have ask "How to get the first N entries by category" – Ghislain Leveque May 22 '12 at 08:53

1 Answers1

3

If you are using PostgreSQL as a database backend and don't need cross database compatibility, you can use the powerful window functions that allow something like that :

Imagine your table looks like that :

CREATE TABLE x (
    i serial primary key,
    value integer not null,
    date timestamp,
    category text);

And you want the most recent value for each category. You will do :

SELECT
    first_value(i) over w,
    first_value(value) over w,
    first_value(date) over w
    category,
FROM x
WINDOW w AS (PARTITION BY category ORDER BY date DESC);

You can use such a query in django via the raw method on the query set manager :

ModelX.objects.raw("""SELECT DISTINCT ....... FROM x WINDOW w .....""")

To get the last N entries by category, the query is a little more complicated and involves a subquery :

SELECT i, value, date, category
FROM (SELECT
        i, value, date, category,
        row_number() over w
    FROM x
    WINDOW w AS (PARTITION BY category ORDER BY date DESC)) AS subquery
WHERE subquery.row_number <= 30;

Seeing that, you could even make a view :

CREATE VIEW x_with_reverse_date_index AS
    (SELECT
        i, value, date, category,
        row_number() over w
    FROM x
    WINDOW w AS (PARTITION BY category ORDER BY date DESC));

And create a django model that queries this view :

class ModelX(models.Model):
    ...
    ...
    row_number = models.IntegerField("Row number when ordering by date desc")

    class Meta:
        db_table = 'x_with_reverse_date_index'

and query it "normally" :

ModelX.objects.filter(category__in = ('catA','catB'), row_number__lte = 30)
ModelX.objects.filter(row_number = 29)
...

Warning : again, don't do this if you need a code that works on another database engine.

Ghislain Leveque
  • 958
  • 1
  • 8
  • 21