5

Situation: I have about 40 million rows, 3 columns of unorganised data in a table in my SQLite DB (~300MB). An example of my data is as follows:

|  filehash  |  filename  |  filesize  |
|------------|------------|------------|
|   hash111  |    fileA   |    100     |
|   hash222  |    fileB   |    250     |
|   hash333  |    fileC   |    380     |
|   hash111  |    fileD   |    250     |  #Hash collision with fileA
|   hash444  |    fileE   |    520     |
|     ...    |     ...    |    ...     |

Problem: A single SELECT statement could take between 3 to 5 seconds. The application I am running needs to be fast. A single query taking 3 to 5 seconds is too long.

#calculates hash
md5hash = hasher(filename)
#I need all 3 columns so that I do not need to parse through the DB a second time
cursor.execute('SELECT * FROM hashtable WHERE filehash = ?', (md5hash,))
returned = cursor.fetchall()

Question: How can I make the SELECT statement run faster (I know this sounds crazy but I am hoping for speeds of below 0.5s)?

Additional information 1: I am running it on Python 2.7 program on a RPi 3B (1GB RAM, default 100MB SWAP). I am asking mainly because I am afraid that it will crash the RPi because 'not enough RAM'.

For reference, when reading from the DB normally with my app running, we are looking at max 55MB of RAM free, with a few hundred MB of cached data - I am unsure if this is the SQLite caches (SWAP has not been touched).

Additional information 2: I am open to using other databases to store the table (I was looking at either PyTables or ZODB as a replacement - let's just say that I got a little desperate).

Additional information 3: There are NO unique keys as the SELECT statement will look for a match in the column which are just hash values, which apparently have collisions.

Timothy Wong
  • 689
  • 3
  • 9
  • 28

1 Answers1

8

Currently, the database has to scan the entire table to find all matches. To speed up searches, use an index:

CREATE INDEX my_little_hash_index ON hashtable(filehash);
CL.
  • 173,858
  • 17
  • 217
  • 259