1

I need to create more than 64 indexes on a table but am running into the "Too many keys specified; max 64 keys allowed" error. Is there some workaround that would allow me to increase this limit beyond 1000 for MariaDb / TokuDB? or is there a reason why this limit is necessary?

(I've seen this question asked/answered for MySQL - with the answers being either to pass --with-max-indexes=256 to ./configure, or modify MAX_KEY in one of the header files at compile time. Unfortunately, these answers don't appear to work for MariaDB)

Ps. Since a typical response is "if you need this many indexes you're doing something wrong", I'll explain why I want to do this, and would appreciate any advice on modifying the design if that's the best "workaround".

My data is stored in 2 tables:

table1 stores 5 columns: (unique key x_position int, column1 string, column2 float, column3 int, column4 tinyint) - it can be as large as 100 million rows

table2 conceptually can be represented as 4 columns: (foreign key x_position int, sample_id string, value1 tinyint, value2 float) - since there could be up to 5000 unique sample_id values, and a different value1 would exist for each (x_position, sample_id) pair, the max number of rows would be 100 million x 5000 = 500 billion rows

The queries I need to do are like:

select  column1, column2, column3... sample_id,
        group_concat(value1)
    from  table1, table2
    where  column1 = string1
      and  column2 < float2
      and  ( (sample_id = string1  and  value1=1)
         or  (sample_id = string2  and  value1=0)
         or  (sample_id = string3  and  value1=1)
           )
      and  value2 < float1
    group by  sample_id;

Instead, I was thinking it would be more efficient to pivot table2 so that it's columns are: (foreign key x_position, sample_id1_value1 tinyint, sample_id1_value2 float, sample_id2_value1 tinyint, sample_id2_value2 float, ...)

and then create composite indexes on small subsets of the (sample_id1_value1, sample_id1_value2, .. ) columns based on domain-specific details of which of these columns will be queried together. This table would have 100 million rows x 10,000 columns (split across several tables to avoid the column limit) which seems better than 500 billion rows. Also it would eliminate the need for "or" and "group by" clauses in the queries, allowing queries to be rewritten like:

select  column1, column2, column3... sample_id,
        sample_id1_value1,
        sample_id1_value2
    from  table1, table2
    where  column1 = string1
      and  column2 < float2
      and  sample_id1_value1=1
      and  sample_id2_value1=0
      and  sample_id3_value1=1
      and  sample_id1_value2 < float1
      and  sample_id2_value2 < float1
      and  sample_id3_value2 < float1; 

Unfortunately the "Too many keys" error is getting in the way of this.

Rick James
  • 135,179
  • 13
  • 127
  • 222
user3911479
  • 83
  • 2
  • 9
  • 2
    Hate to be 'that guy' on the net who gives you something else as a solution instead of giving a way to make the way you are currently doing things -- work, but if you are going to be dealing with that much data, then you are better off using Big Data services such as Google DB. If you need a database structure that allows for billions of fields -- believe it or not, SQLite3 offers that capability, which is also more ideal for your temporary table situation as you can leave the files to 'cache' future lookups. -- PS ... You work at the NSA or something ^^ – Kraang Prime Jun 14 '15 at 23:06
  • Interesting.. I'm also considering SQLite3, but the big downside there is that it doesn't have compression (except as a non-free plugin). Also, s/NSA/whole genome sequencing data/ – user3911479 Jun 14 '15 at 23:10
  • Havn't really looked that much into it, although the I did notice the data is larger than it needs to be, however i would take speed over compression any day. especially with billions of records and fields. it does slow down obviously, but I was quite taken aback when I discovered how much data it can store (int64) for field structure. – Kraang Prime Jun 14 '15 at 23:13
  • I've tried loading a test dataset into SQLite3 where table1 is 30 million rows and table2 (pivoted) is 30 million x ~300 columns. The database size came out to ~350Gb, and gzipping the db file shrank it to ~175Gb so compression would be a big win. Along the same lines, when comparing disk use for a similar but smaller test dataset, I got 21G for sqlite3 db, 13Gb for MySQL/InnoDb compressed tables, and 6Gb for MySQL/TokuDB zlib - which is why I'm leaning towards MySQL/TokuDB. – user3911479 Jun 15 '15 at 02:03
  • Please provide `SHOW CREATE TABLE` for each table. Please indicate which table each column is in when doing a `JOIN`. Please explain why you need so many indexes. – Rick James Jun 15 '15 at 18:54

0 Answers0