6

I am running into this Postgres issue where the same query takes a long time to execute if I use a parameter vs hardcoding its value on the query string. The column name is 'media_type' and it's a VARCHAR(20). I am running these queries from PHP, using Symfony2 and Doctrine2 ORM and the table in question has about 1.000.000 records.

Do I have a problem on my query? Can it be a Postgres configuration problem?

1 - Hardcoded value for media_type

duration: 5.365 ms  parse pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = 'Collection' AND enabled = 'true' AND site_id = $1 AND user_id = $2 ORDER BY id DESC LIMIT $3 OFFSET $4
duration: 0.142 ms  bind pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = 'Collection' AND enabled = 'true' AND site_id = $1 AND user_id = $2 ORDER BY id DESC LIMIT $3 OFFSET $4
parameters: $1 = '1', $2 = '1', $3 = '100', $4 = '0'
duration: 8.667 ms  execute pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = 'Collection' AND enabled = 'true' AND site_id = $1 AND user_id = $2 ORDER BY id DESC LIMIT $3 OFFSET $4
parameters: $1 = '1', $2 = '1', $3 = '100', $4 = '0'

Execution plan:

duration: 8.640 ms  plan:
    Query Text: SELECT id,site_id FROM item where media_type = 'Collection' AND enabled = 'true' AND site_id = $1 AND user_id = $2 ORDER BY id DESC LIMIT $3 OFFSET $4
    Limit  (cost=8.38..8.38 rows=1 width=12) (actual time=8.516..8.595 rows=24 loops=1)
      Buffers: shared hit=10 read=15
        ->  Sort  (cost=8.38..8.38 rows=1 width=12) (actual time=8.505..8.530 rows=24 loops=1)
            Sort Key: id
            Sort Method: quicksort  Memory: 26kB
            Buffers: shared hit=10 read=15
            ->  Index Scan using item_media_type_index on item  (cost=0.00..8.37 rows=1 width=12) (actual time=7.955..8.397 rows=24 loops=1)
                    Index Cond: ((media_type)::text = 'Collection'::text)
                    Filter: (enabled AND (site_id = $1) AND (user_id = $2))
                    Buffers: shared hit=8 read=15

2 - Using a parameter for media_type (SLOWER)

duration: 5.557 ms  parse pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = $1 AND enabled = 'true' AND site_id = $2 AND user_id = $3 ORDER BY id DESC LIMIT $4 OFFSET $5
duration: 1.322 ms  bind pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = $1 AND enabled = 'true' AND site_id = $2 AND user_id = $3 ORDER BY id DESC LIMIT $4 OFFSET $5
parameters: $1 = 'Collection', $2 = '1', $3 = '1', $4 = '100', $5 = '0'
duration: 71564.998 ms  execute pdo_stmt_00000001: SELECT id,site_id FROM item where media_type = $1 AND enabled = 'true' AND site_id = $2 AND user_id = $3 ORDER BY id DESC LIMIT $4 OFFSET $5
parameters: $1 = 'Collection', $2 = '1', $3 = '1', $4 = '100', $5 = '0'

Execution plan:

duration: 71564.922 ms  plan:
    Query Text: SELECT id,site_id FROM item where media_type = $1 AND enabled = 'true' AND site_id = $2 AND user_id = $3 ORDER BY id DESC LIMIT $4 OFFSET $5
    Limit  (cost=90663.16..181326.31 rows=17184 width=12) (actual time=3.667..71564.864 rows=24 loops=1)
      Buffers: shared hit=183786 read=96585
        ->  Index Scan Backward using item_pkey on item  (cost=0.00..906610.46 rows=171836 width=12) (actual time=3.655..71564.798 rows=24 loops=1)
               Filter: (enabled AND ((media_type)::text = $1) AND (site_id = $2) AND (user_id = $3))
               Buffers: shared hit=183786 read=96585

Thanks in advance.

luis
  • 1,849
  • 3
  • 16
  • 17

3 Answers3

9

This is a bit of a long-standing wart in PostgreSQL that's historically required some interesting planner tuning to work around. It's fixed in PostgreSQL 9.2 (in beta now) though thanks, as usual, to Tom Lane.

E.1.3.1.3. Optimizer

Improve the ability of the planner to choose parameterized plans (Tom Lane)

A prepared statement is now parsed, analyzed, and rewritten, but not necessarily planned. When the prepared plan is executed with parameters, the planner might replan it for every constant, or it might execute a generic plan if its cost is close to that of a constant-specific plan.

See the 9.2 beta release notes and a quick note I wrote about this on lwn.net. There's lots of info about handling prepared / parameterised statements running slower than normal ones on the mailing lists.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Thanks! Is there an expected release date for PostgreSQL 9.2? I looked for it without luck. Also, usually how stable are the PostgreSQL betas? – luis May 31 '12 at 14:41
  • 1
    @luis I run PostgreSQL betas where possible for my dev work and haven't ever hit an issue. However, I haven't used them in production. If you're still in the dev stage, using a beta is a no-brainer. For production - well, Pg in general is amazingly stable and while there _is_ the very occasional dangerous bug they're typically in odd corner cases thanks to the regression tests. I can't advise you to run a beta in production - it's beta for a reason, the team doesn't think it's ready - but I'd do it if I had to for a critical issue. If possible, work around and wait for the release. – Craig Ringer Jun 03 '12 at 02:00
  • 2
    @luis PostgreSQL 9.2 was released today, FYI. – Craig Ringer Sep 11 '12 at 21:49
2

I experienced a very similar issue once when I was binding to a SMALLINT field and passing in a value that Postgres was implicitly casting from INTEGER to SMALLINT. I fixed it by making the cast explicit. Since media_type is type VARCHAR(20), Postgres is doing an implicit cast from type TEXT. Try this:

where media_type = $1::VARCHAR(20)

Sam Choukri
  • 1,874
  • 11
  • 17
  • This looks like it's worth trying -- though it's also doing the cast in the more efficient index-lookup plan. – Edmund May 31 '12 at 00:16
1

On your static, faster query, the item_media_type_index is being used. On your bound, slower query, the item_media_type_index is not being used.

What is the selectivity of the "media_type" column in the "item" table? If you did a:

SELECT media_type, COUNT(*)
  FROM item
  GROUP BY media_type
  ORDER BY 2 desc

Are the media_types evenly balanced or are there relatively few 'Collection' media_types compared to the others? If there are relatively few 'Collection' items then I would hazard this guess: with the static query, the parser knows you are querying for 'Collection' and can determine that 'Collection' has a low count and an index is probably worth using. But in the bind variable case, the parser doesn't know which media_type you are using. Some other media_type value might account for a high proportion of the records in the table (say 20%). In that case, it would be faster to do a scan than to even bother with using the index. The parser needs to make a decision, and it happens to decide not to use the index (wrong for your case, but probably correct for the other media_type). This is just a guess based on how some other rdbms work.

In this case, the answer is to hard code if you know the selectivity attributes are very skewed, use dynamic sql to force late parsing, or force index usage if you think that is right.

Glenn
  • 8,932
  • 2
  • 41
  • 54