I have two large (~100 million rows) tables I'm trying to join. I have indices on both columns used in the join. Selecting the first 1000 rows takes several hours, but when it's done, SQLite Studio reports that it only took a minute. Then, it takes another several hours for SQLite Studio to count the rows for my results and if I try to open another query window, it becomes unresponsive for these hours. The entire time, task manager shows around 25% CPU usage and 7-8 MB/s disk usage for the process. I also tried selecting the top 10k rows and it took 11 hours to complete and another 11 hours to get the row count, but reported that the query finished in 4 minutes. Here is the query:
Select d.PRC, s.prccd, abs(abs(d.PRC)-s.prccd), *
from dsf d
join secd.secd s
on s.datadate=d.DATE and substr(s.cusip,1,8)=d.CUSIP
where abs(abs(d.PRC)-s.prccd)>.0006
limit 10000
Why is this taking so long? I know 100 million rows is a lot, but with sorted indices, shouldn't joining happen in linear time? Adding the indices took several minutes, not hours, and that should be O(n log n) since it has to sort. I get the same results without using substr(). So why is it taking so long?
Why is SQLite Studio reporting that it only takes a minute or two?
Why does SQLite Studio take so long to count the result rows, after the results are already displayed?
EDIT: Output of EXPLAIN QUERY PLAN
5 0 0 SCAN TABLE dsf AS d
7 0 0 SEARCH TABLE secd AS s USING INDEX secd_datadate (datadate=?)