Or better said: When to use array as a field data type in a table?
Which solution provides better search results?
Or better said: When to use array as a field data type in a table?
Which solution provides better search results?
I avoid arrays for 2 reasons:
I've considered this problem as well and the conclusion that I came to, is to use arrays when you want to eliminate table joins. The number of elements contained in each array isn't as important as the size of the tables involved. If there are only a few thousand rows in each table, then joining to get the 50 sub rows shouldn't be a big problem. If you get into 10's or 100's of thousands or rows, you're likely to start chewing through a lot of processor time and disk i/o though.
Don't know how long these links stay live so I'll paste the results below: http://sqlfiddle.com/#!17/55761/2
TLDR; searching a table index and then joining is fast, BUT adding a GIN index (using gin__int_ops) to a single table with an array column can be faster. Additionally, the flexibility of being able to match "some" or a small number of your array values might be a better option e.g. a tagging system.
create table data (
id serial primary key,
tags int[],
data jsonb
);
create table tags (
id serial primary key,
data_id int references data(id)
);
CREATE INDEX gin_tags ON data USING GIN(tags gin__int_ops);
SET enable_seqscan to off;
with rand as (SELECT generate_series(1,100000) AS id)
insert into data (tags) select '{5}' from rand;
update data set tags = '{1}' where id = 47300;
with rand as (SELECT generate_series(1,100000) AS id)
INSERT INTO tags(data_id) select id from rand;
Running:
select data.id, data.data, data.tags
from data, tags where tags.data_id = data.id and tags.id = 47300;
and
select data.id, data.data, data.tags
from data where data.tags && '{1}';
Yields:
Record Count: 1; Execution Time: 3ms
QUERY PLAN
Nested Loop (cost=0.58..16.63 rows=1 width=61)
-> Index Scan using tags_pkey on tags (cost=0.29..8.31 rows=1 width=4)
Index Cond: (id = 47300)
-> Index Scan using data_pkey on data (cost=0.29..8.31 rows=1 width=61)
Index Cond: (id = tags.data_id)
and
Record Count: 1; Execution Time: 1ms
QUERY PLAN
Bitmap Heap Scan on data (cost=15.88..718.31 rows=500 width=61)
Recheck Cond: (tags && '{1}'::integer[])
-> Bitmap Index Scan on gin_tags (cost=0.00..15.75 rows=500 width=0)
Index Cond: (tags && '{1}'::integer[])
The tables will always provide better search results assuming you're querying something within the actual array. With a subtable, you can index the contents trivially, whereas with an array, you'd have to literally create 50 indexes (one for each potential element within the array).
I think that arrays have to be used for some custom data. But for foreign keys - it's better to use link table (or something else but column per key). This way you have data control at DB level and easy queries for join - you need for join even if you have them in arrays (for full data set) - but arrays much more complicated than "standart" SQL. P.S. Sorry bad english