2

I have a set of SQLITE tables that adds up to 7 GB on disk. The particular table I am querying (FQ) has approximately 50 columns, and 3 million rows.

I am querying from the sqlite3 interactive shell (sqlite3.exe). The query I am running is: "select count(Date) from FQ;". The approximately 3 million rows take 10+ minutes to count. After the first time, it seems to be cached and the result is almost instant. I am running on a Windows 10 PC with 8 GB RAM and nothing else running.

Date is one of two primary keys (it is Date and ID). There are 360 unique dates, and ~8-10k IDs, and the table has one entry for each date/ID combination.

Here are some things I have already done:

  1. I have a covering index on the entire table.
  2. I have run ANALYZE on this database.
  3. When I do an "EXPLAIN QUERY PLAN" it says it is doing a table scan (as expected for counting) using the covering index.

How can a simple scan through a table of 3 million rows could take so long?

[EDIT: I should clarify that I'm not interested in alternate ways of counting - I am hoping that scans don't have to be so slow (it is also slow, for example, with using sum()+"group by")]

[UPDATE: Today I tried two additional things - first I tried using "WITHOUT ROWID" and the results were similar either way. Then I removed the indices for all my tables altogether. Now the count of a few million rows finishes in 4 seconds. The database file is naturally smaller (2 GB vs 7 GB) now that all the indices are gone, but that shouldn't explain a 10 minutes to 4 seconds kind of difference! What makes the covering index slow down a table scan? Is there something where scanning an index is slower, and if so, why doesn't SQLITE just scan the original table itself?]

jwcoder
  • 131
  • 9
  • How long does `select count(*) from fq` take? – Gordon Linoff Jun 06 '20 at 12:07
  • Not having anything cached the first time is probably part of it, yes. – Shawn Jun 06 '20 at 12:11
  • @GordonLinoff, using * makes it take as long or even longer - it's hard to tell perfectly because the caching is hard to control. – jwcoder Jun 06 '20 at 12:55
  • @Shawn, fair enough that no cache will slow it down, but I can copy the entire database file (7 GB) to a USB hard disk in less time than it is taking to count the 3 million rows of one table. Even without any caching, I wouldn't think it is expected to take that long (or is it?) – jwcoder Jun 06 '20 at 12:56
  • I added an edit above that explains I'm running this through the sqlite3 shell. I noticed it uses "UAC Virtualization" in the task manager... not sure if that is relevant, but noting it here in case it explains things. – jwcoder Jun 06 '20 at 13:13
  • 2
    @jwcoder . . . The `*` in `count(*)` actually makes the query *faster* not slower. The database engine knows that it can use any index -- or the data pages -- for counting the rows. – Gordon Linoff Jun 06 '20 at 16:11
  • @GordonLinoff - Thanks, that's good to know about the benefit of using '*' although I guess in my case it is hard to see any difference since it takes 10-20 minutes either way :( – jwcoder Jun 06 '20 at 16:18
  • See my latest update above - it looks like removing the covering index speeds things up dramatically, I'm just not sure why? – jwcoder Jun 07 '20 at 22:14

1 Answers1

3

I finally figured out the problem. Running a VACUUM command on the database solved the issue. I had run .dbinfo to confirm that the page_size multiplied by the number of pages added up to approximately the file size. That, combined with the fact that I hadn't deleted anything from the database (only inserted), led me to assume that I didn't need to vacuum (or de-fragment).

But it looks like the re-organization that vacuum does also makes a dramatic difference in the speed of the count queries (which now finish in milliseconds as I've seen reported elsewhere).

jwcoder
  • 131
  • 9
  • Great to know. Thank you for sharing the solution and marking your question answered. I wouldn't have guessed that `vacuum` would be the solution. Much appreciated. – zedfoxus Jun 09 '20 at 17:11
  • 1
    @zedfoxus - No problem, I've benefited by so many answers here that it's the least I can do! :) – jwcoder Jun 10 '20 at 19:21