0

I have huge amount of polynomials of degree 6 (like x^6 + 2*x^5 + x^4 + x^3 + x^2 + 1) stored in text files together with some additional info. Total amount is more than 400 000 000. All of them have integers coefficients.

I would like to efficiently store them and then perform fast search for some polynomial in my store. I don't need multiply clients functionality. I would like to store data and operate with it on same machine.

It seems to me like classical DB tasks. So now I consider some DB as engine for this.

  1. What DB is the most efficient choice in my case? Is sqlite efficient enough?
  2. What if the most efficient way for storing polynomials? Table with columns a0, a1, a2 ... a6, add_info or some serializing like string serializing "5,3,5,6,1,2,3" or may be some DB has array data-type? I'm going to make not only exact match search but something like this get all polynomials with a6 = 3 or get all uniq a5 for polynomials with a6 = 3.
petRUShka
  • 9,812
  • 12
  • 61
  • 95

1 Answers1

3

You probably want to use a more powerful database than SQLite for 400 million rows. There are free versions of MySQL, Postgres, SQL Server, and Oracle (for instance) that might work better. Note that Stack Overflow is not a site for recommending particular products. I just bring this up in response to your particular question on SQLite. And, SQLite might be fine for this purpose.

Your basic data structure will be a polynomial id along with 7 columns of integers, one for each coefficient.

Based on your query examples, you will want a separate index on each coefficient. For instance:

create index idx_polynomials on polynomials(a6);

You wouldn't want to serialize the values. That would remove much of the utility of storing this data in a relational database and prevent the use of indexes.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • what do you recommend instead of SQLite? Mysql or Postgres for this type of storing and search tasks? – petRUShka Jul 30 '14 at 12:34
  • What specifically makes SQLite inappropriate for this? SQLite's main differences are being embedded and no concurrency, neither of which should affect this. – CL. Jul 30 '14 at 13:31
  • 1
    @CL. . . . To the best of my knowledge, SQLite doesn't support parallelism in queries, data partitioning, and perhaps other advanced capabilities. When starting with a table that has 400 million rows, I would want to keep options open for enhancing performance -- even if it cost additional money to support. However, with an appropriate physical machine architecture, SQLite might very well do the job well enough. – Gordon Linoff Jul 30 '14 at 16:06