PostgreSQL - query against GIN index of HSTORE value

Question

I have the following constructor (as a test):

CREATE TABLE product (id BIGSERIAL PRIMARY KEY, ext hstore);
CREATE INDEX ix_product_ext ON product USING GIN(ext);

INSERT
INTO    product (id, ext)
SELECT  id, ('size=>' || CEILING(10 + RANDOM() * 90) || ',mass=>' || CEILING(10 + RANDOM() * 90))::hstore
FROM    generate_series(1, 100000) id;

I have the following query, which works ok:

SELECT  COUNT(id)
FROM    (
    SELECT  id
    FROM    product
    WHERE  (ext->'size')::INT >= 41
    AND    (ext->'mass')::INT <= 20
) T

But I believe the correct way to do this is using the @> operator. I have the following, but it gives a syntax error:

SELECT  COUNT(id)
FROM    (
    SELECT  id
    FROM    product
    WHERE  ext @> 'size>=41,mass<=20'
) T

How should I write this?

score 6 · Answer 1 · answered Jun 24 '11 at 22:46

6

Your initial attempt is correct but you need to use (partial) btree indexes and bitmap index scans to rely on it:

create index on product(((ext->'size')::int)) where ((ext->'size') is not null);

The same for mass, and if the planner doesn't get it on the spot add two where clauses, ie where ext->'size' is not null and the same for mass.

If there is a pattern of some kind (which is likely, since most products with a size also have a mass), potentially create a multicolumn index combining the two - one sac, the other desc.

The gin index as you wrote it, along with the accompanying query (with a syntax error) will basically do the same thing but unordered; it'll be slower.

answered Jun 24 '11 at 22:46

Denis de Bernardy

75,850
13
131
154

Thanks Dennis. I had included partial indexes as you show them in my tests, and actually they were slower, both for insert and for query. In fact, querying was substantially faster against the GIN. Why do you say I need partial indexes for reliability? – IamIC Jun 24 '11 at 22:56
Actually, technically that is an expression index, not a partial index. – IamIC Jun 24 '11 at 23:04
2

Further testing showed that expression indexes are in fact faster on average. Complex queries win with GIN. This would really be a typical workload test scenario. – IamIC Jun 24 '11 at 23:57
1

Yeah. The one I suggested yesterday is actually expression and partial. If you're repeatedly doing the same kind of query over and over, especially if there is any need for ordering the results and tying them to a limit, my experience is you'll get better performance out of (pre-ordered and partial) expression indexes; GIN will win if you're constantly querying against widely varying conditions (e.g. tsvectors). – Denis de Bernardy Jun 25 '11 at 09:10

score 3 · Accepted Answer · answered Jun 24 '11 at 22:09

Reading hstore documentation your (last query) size>=41 does not mean "when size is greater or equal than 41":

text => text    make single-pair hstore

Following that you can't write mass<=20, because there is no such operation. Using @> operator:

hstore @> hstore    does left operand contain right?

you can write:

SELECT count(id)
FROM product
WHERE ext @> 'size=>41,mass=>20';

However it takes only these products where size is equal to 41 and mass is equal to 20.

PostgreSQL - query against GIN index of HSTORE value

2 Answers2

Linked