0

I have a very large table with over two billion rows and ~90 columns. The table size is 1.8TB and the goal is being able to do a query on any single column or combination of columns in the shortest time possible. The data will not be updated and no new rows will be inserted. Most of the data is strings, and its possible that full-text search will be used.

So far I have created 3 multi-column b-tree indexes foolishly thinking that it is just as simple as that, but after observing the query performance and reading a bit more into it I learned that only the first column of an index matters, and if the WHERE clause does not use a column that is first - it's suboptimal.

Doing explain analyze select * from data where person_name = 'Joe Smith' returns that it is using one of the 32 column indexes, yet the query is still much longer than I would expect with an index. Explain analyze output below

Index Scan using data_personid_idx on data  (cost=0.83..321155265.68 rows=22580 width=2805) (actual time=981.260..3164759.559 rows=216 loops=1)
  Index Cond: ((person_name)::text = 'Joe Smith'::text)
Planning Time: 0.145 ms
JIT:
  Functions: 4
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 1.373 ms, Inlining 188.532 ms, Optimization 146.744 ms, Emission 234.549 ms, Total 571.198 ms
Execution Time: 3165416.950 ms

Would splitting the big indexes us help with this? I am in the process of creating a single index on the person_name column but as you can imagine this takes a long time with such a large table, and I was wondering if there would be a better way to speed up query times. Maybe even using a GIN index would help in full-text search scenarios?

  • It is always hard to say with databases which approach will work best but multi column indexes work only well if you are often searching on combinations of the first N columns in the index. If not you be better of with multiple single column indexes. Note however that having many indexes can decrease write performance to the table. Constructing a jsonb document from a combination of columns and putting a GIN index on it might work well to. You will have to test which works better in your case. – Eelke Aug 03 '20 at 04:23
  • Try to disable jit (`set jit = off;`) and then try again. –  Aug 03 '20 at 06:04
  • Thanks, in this case multiple single column indexes is the only way, as I cannot predict the queries that will be made frequently. Also jit=off helped - thank you. – Arthur B Aug 04 '20 at 01:20

0 Answers0