Postgres two subqueries poor performance [EAV model]

Question

I have a table called "meta" with 2.2 million records. This table contains 3 columns: productId, key and value. Key and value are the two columns containing a meta description of a product.

The following query takes 2.6 sec and returns 676 results (PostgreSQL 8.4.13, CentOS 6.4 64-bit). This query is used to retrieve all possible meta descriptions from a certain filter (size) where the user already filtered on two other filters (year and source).

I tried the array solution from this topic but it only made it worse: PostgreSQL IN operator with subquery poor performance

The two subqueries are pretty fast (75ms and 178ms), but combining them causes performance issues. Is there a way to rewrite the query?

This is the current query:

SELECT DISTINCT ON(value) key, value 
FROM   "meta" 
WHERE  key = 'size'
    AND "productId" IN (SELECT "productId" 
        FROM   "meta" 
        WHERE  "value" = 'ibm'
            AND "key" = 'source' )
    AND "productId" IN (SELECT "productId" 
        FROM  "meta"
        WHERE "value" >= '1920'
            AND "value" <= '2010' 
            AND "key" = 'year' ) 
ORDER  BY value

With the following EXPLAIN ANALYZE:

Unique  (cost=38829.46..38843.19 rows=564 width=15) (actual time=2674.474..2690.856 rows=676 loops=1)
  ->  Sort  (cost=38829.46..38836.32 rows=2745 width=15) (actual time=2674.471..2681.333 rows=66939 loops=1)
        Sort Key: public."meta".value
        Sort Method:  quicksort  Memory: 8302kB
        ->  Hash Join  (cost=32075.86..38672.69 rows=2745 width=15) (actual time=472.158..2472.002 rows=66939 loops=1)
              Hash Cond: (public."meta"."originalId" = public."meta"."productId")
              ->  Nested Loop  (cost=15079.41..21563.33 rows=13109 width=23) (actual time=113.873..1013.113 rows=104307 loops=1)
                    ->  HashAggregate  (cost=15079.41..15089.21 rows=980 width=4) (actual time=113.802..163.805 rows=105204 loops=1)
                          ->  Bitmap Heap Scan on "meta"  (cost=315.39..15051.42 rows=11196 width=4) (actual time=24.540..68.237 rows=105204 loops=1)
                                Recheck Cond: (((key)::text = 'source'::text) AND ((value)::text = 'KADASTER_WOII_RAF_USAAF'::text))
                                ->  Bitmap Index Scan on "productMetadataKeyValueIndex"  (cost=0.00..312.60 rows=11196 width=0) (actual time=23.506..23.506 rows=105204 loops=1)
                                      Index Cond: (((key)::text = 'source'::text) AND ((value)::text = 'ibm'::text))
                    ->  Index Scan using "idx_productId" on "meta"  (cost=0.00..6.59 rows=1 width=19) (actual time=0.006..0.008 rows=1 loops=105204)
                          Index Cond: (public."meta"."productId" = public."meta"."productId")
                          Filter: ((public."meta".key)::text = 'size'::text)
              ->  Hash  (cost=16954.58..16954.58 rows=3350 width=4) (actual time=358.214..358.214 rows=184571 loops=1)
                    ->  HashAggregate  (cost=16921.08..16954.58 rows=3350 width=4) (actual time=258.149..319.154 rows=184571 loops=1)
                          ->  Bitmap Heap Scan on "meta"  (cost=1172.62..16825.39 rows=38273 width=4) (actual time=86.725..167.110 rows=184571 loops=1)
                                Recheck Cond: (((key)::text = 'year'::text) AND ((value)::text >= '1920'::text) AND ((value)::text <= '2010'::text))
                                ->  Bitmap Index Scan on "productMetadataKeyIndex"  (cost=0.00..1163.05 rows=38273 width=0) (actual time=83.992..83.992 rows=184571 loops=1)
                                      Index Cond: (((key)::text = 'year'::text) AND ((value)::text >= '1920'::text) AND ((value)::text <= '2010'::text))
Total runtime: 2696.276 ms

Defined indexes:

idx_productId   CREATE INDEX "idx_productId" ON "meta" USING btree ("productId")    
productMetaUnique_id    CREATE UNIQUE INDEX "productMetaUnique_id" ON "meta" USING btree ("productId", key)     
productMetadataKeyIndex CREATE INDEX "productMetadataKeyIndex" ON "meta" USING btree (key)  
productMetadataKeyValueIndex    CREATE INDEX "productMetadataKeyValueIndex" ON "meta" USING btree (key, value)

A good example of the problems the dreaded EAV model creates... Did you try to combine the two sub-selects into a single one using a `UNION ALL`? — , May 13 '13 at 09:38
@horse Yeah, I tried to combine them with a UNION, but it will create a OR statement. Results need to be in BOTH subqueries, not just one. — Peter Schoep, May 13 '13 at 09:41

score 0 · Accepted Answer · answered May 13 '13 at 09:55

First off, PostgreSQL 8.1.4 is antique. Upgrade that, because every release since (8.2, 8.3, 8.4, 9.0, 9.1 and 9.2) have seen improvements to the query planner.

Next, you could probably rewrite your query to use joins and a group by, and possibly get a better plan.

select m1."value"
from meta as m1
join meta as m2 on m2."productId" = m1."productId"
               and m2."key" = 'source'
               and m2."value" = 'ibm'
join meta as m3 on m3."productId" = m1."productId"
               and m3."key" = 'year'
               and m3."value" between 1920 and 2010
where m1."key" = 'size'
group by m1."value"

The latter could probably use an index on (key, product_id) and (product_id, key, value) for an index-only scan in PG 9.2, avoiding table lookups altogether.

Next, this is a case where you should have put the data in your products table directly. If you're querying against it, it probably doesn't belong in meta in the first place.

Lastly, if you really want to keep the stuff in meta, this might be a case where it pays to go in there with exist statements:

select val
from unnest(array['Known', 'Sizes', 'Go', 'Here']::text[]) as val
where exists (
    select 1
    from meta as m1
    join meta as m2 on m2."productId" = m1."productId"
                   and m2."key" = 'source'
                   and m2."value" = 'ibm'
    join meta as m3 on m3."productId" = m1."productId"
                   and m3."key" = 'year'
                   and m3."value" between 1920 and 2010
    where m1."key" = 'size'
      and m1."value" = val
  );

Doing so will spare you expensive group by, sort and unique operations.

thank you for your answer and effort. I'm using PostgreSQL 8.4.13, 8.1.4 was a typo. I'll try the first query, and update this post with the results. Im using a EAV-model because many products don't have (many) meta descriptions. Putting all descriptions in the products table directly will cause many empty columns and makes flexibility harder in my opinioin. — Peter Schoep, May 13 '13 at 09:58
"Putting all descriptions in the products table directly will cause many empty columns and makes flexibility harder in my opinion." -- You do realize that they're stored in an external store when too large anyway, right? http://www.postgresql.org/docs/9.2/static/storage-toast.html — Denis de Bernardy, May 13 '13 at 12:47
Rewritten query with joins and group by takes 863ms to complete, much better performance! Im familier with TOAST, meta table is not using it at this moment. — Peter Schoep, May 14 '13 at 07:00

Postgres two subqueries poor performance [EAV model]

1 Answers1