6

delayed_job does a query like this regularly:

SELECT  "delayed_jobs".*
FROM "delayed_jobs"
WHERE ((run_at <= '2012-05-23 15:16:43.180810' AND (locked_at IS NULL OR locked_at < '2012-05-23 11:16:43.180841') OR locked_by = 'host:foo pid:1') AND failed_at IS NULL)
ORDER BY priority ASC, run_at ASC LIMIT 5

My logs on my pretty big DB machine report that it takes a quarter second to run. I could just throw some indexes on all the columns that are selected on, but I can probably get more performance out of a multi-column index.

What's the most optimal multi-column index I can make for this query? Are there any tools that can calculate this for me?

update

postgres version: 9.1.3

one existing index: priority, run_at (named "delayed_jobs_priority")

out of explain analyze:

Limit  (cost=0.00..219.65 rows=5 width=1154) (actual time=0.727..0.727 rows=0 loops=1)
   ->  Index Scan using delayed_jobs_priority on delayed_jobs  (cost=0.00..351.43 rows=8 width=1154) (actual time=0.725..0.725 rows=0 loops=1)
         Filter: ((failed_at IS NULL) AND (((run_at <= '2012-05-23 18:11:03.980113'::timestamp without time zone) AND ((locked_at IS NULL) OR (locked_at < '2012-05-23 14:11:03.98014'::timestamp without time zone))) OR ((locked_by)::text = 'host:foo pid:1'::text)))
 Total runtime: 0.754 ms
(4 rows)
John Bachir
  • 22,495
  • 29
  • 154
  • 227
  • Are there any existing indexes? – swasheck May 23 '12 at 15:27
  • 1
    What is the current explain plan and what version of PG are you on? – Kuberchaun May 23 '12 at 15:32
  • @JustBob i added this info to my question now – John Bachir May 23 '12 at 19:42
  • @swasheck only one index, on priority, run_at – John Bachir May 23 '12 at 19:43
  • I presume that priority and the datetime+flag clause fields are more or less orthogonal, so you'll lose either way. (too many indexes, OR a final sort + limit). The range of values for priority is probably rather small? – wildplasser May 23 '12 at 19:59
  • To the OP: could you add the output for `explain analyze same_query` ? – wildplasser May 23 '12 at 20:09
  • @wildplasser yes i actually don't use the priority column, the only value is 0. i've updated my post with `explain analyze` – John Bachir May 24 '12 at 20:34
  • Sorry, I fail to see the problem. The above plans report 0.7 ms execution time, which is hard to improve. The plan itself is hard to beat. Is this the test machine, and are the plan && runtime very different on the production machine?? WRT the index: if the priority is *almost always* 0, and does not change too much the index on {priority,datetime} seems adequate (but a bit awkward). Depends a bit on the other processes manipulating the affected fields. NB: the rowsize seems pretty large (width=1154, but not for the index) For a hardhitter this could impact, for a N<10 query it is nt importa. – wildplasser May 24 '12 at 21:49
  • well, as you can see in the original post, it sometimes takes a quarter second to run. – John Bachir May 25 '12 at 03:16
  • But only in text (in the log?) . If the .25 sec is the overall throughput time corresponds to a sub-millisecond query, there might be some locking/concurrency issues taking place? Is there heavy insert/update traffic? – wildplasser May 25 '12 at 10:37
  • i guess i don't know, the slow query log doesn't differentiate. *shrug* – John Bachir May 25 '12 at 12:40
  • there certainly isn't heavy insert and update traffic on that table. – John Bachir May 25 '12 at 12:41

2 Answers2

1

Since you have a LIMIT clause, it's possible that you want an ordering index instead of a filtering one, on (priority, run_at).

What is the percentage of records in your table which satisfy the WHERE condition?

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
0

I don't think a multicolumn index is of much use in this case. Use multiple single column indexes.

Eelke
  • 20,897
  • 4
  • 50
  • 76