I have a very large table with over two billion rows and ~90 columns. The table size is 1.8TB and the goal is being able to do a query on any single column or combination of columns in the shortest time possible. The data will not be updated and no new rows will be inserted. Most of the data is strings, and its possible that full-text search will be used.
So far I have created 3 multi-column b-tree indexes foolishly thinking that it is just as simple as that, but after observing the query performance and reading a bit more into it I learned that only the first column of an index matters, and if the WHERE clause does not use a column that is first - it's suboptimal.
Doing explain analyze select * from data where person_name = 'Joe Smith'
returns that it is using one of the 32 column indexes, yet the query is still much longer than I would expect with an index. Explain analyze output below
Index Scan using data_personid_idx on data (cost=0.83..321155265.68 rows=22580 width=2805) (actual time=981.260..3164759.559 rows=216 loops=1)
Index Cond: ((person_name)::text = 'Joe Smith'::text)
Planning Time: 0.145 ms
JIT:
Functions: 4
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.373 ms, Inlining 188.532 ms, Optimization 146.744 ms, Emission 234.549 ms, Total 571.198 ms
Execution Time: 3165416.950 ms
Would splitting the big indexes us help with this? I am in the process of creating a single index on the person_name column but as you can imagine this takes a long time with such a large table, and I was wondering if there would be a better way to speed up query times. Maybe even using a GIN index would help in full-text search scenarios?