2

I'm writing a small Objective-C library that works with an embedded SQLite database.

The SQLite version I'm using is 3.7.13 (checked with SELECT sqlite_version())

My query is:

SELECT ROUND(AVG(difference), 5) as distance 
FROM (
  SELECT (
    SELECT A.timestamp - B.timestamp 
    FROM ExampleTable as B 
    WHERE B.timestamp = (
      SELECT MAX(timestamp) 
      FROM ExampleTable as C 
      WHERE C.timestamp < A.timestamp
    )
  ) as difference 
  FROM ExampleTable as A 
  ORDER BY timestamp)

Basically it outputs the average timestamp difference between rows ordered by timestamp.

I tried the query on a sample database with 35k rows and it runs in around 100ms. So far so good.

I then tried the query on another sample database with 100k rows and it hangs at sqlite3_step() taking up 100% of CPU usage.

Since I cannot step into sqlite3_step() with the debugger, is there another way I can get a grasp of where is the function hanging or a debug log of what is the issue here?

I also tried running other queries from my library on the 100k rows database and there is no issue, but it's also true that these are simple queries with no subquery. Maybe this is the issue?

Thanks

UPDATE

This is the output of EXPLAIN QUERY PLAN as requested:

"1","0","0","SCAN TABLE ExampleTable AS A"
"1","0","0","EXECUTE CORRELATED SCALAR SUBQUERY 2"
"2","0","0","SCAN TABLE ExampleTable AS B"
"2","0","0","EXECUTE CORRELATED SCALAR SUBQUERY 3"
"3","0","0","SEARCH TABLE ExampleTable AS C"
"1","0","0","USE TEMP B-TREE FOR ORDER BY"
"0","0","0","SCAN SUBQUERY 1"
Vik
  • 1,897
  • 12
  • 18
  • This query needs to compute all results before it can output the first row. 100000² is *much* larger than 35000², and is likely to be too large for the cache. Show the output of [EXPLAIN QUERY PLAN](http://www.sqlite.org/eqp.html). – CL. Jan 04 '15 at 22:06
  • I added the output as an edit – Vik Jan 05 '15 at 17:30

2 Answers2

1

Looking up rows by their timestamp value can be optimized with an index on this column:

CREATE INDEX whatever ON ExampleTable(timestamp);

And this query is inefficient: ORDER BY does not affect values that are averaged, and the timestamp values in B and C are always identical, so you can drop one of them:

SELECT ROUND(AVG(difference), 5) AS distance 
FROM (
  SELECT timestamp -
         (SELECT MAX(timestamp)
          FROM ExampleTable AS B
          WHERE timestamp < A.timestamp)
         AS difference
  FROM ExampleTable AS A)
CL.
  • 173,858
  • 17
  • 217
  • 259
  • unfortunately this hangs as well. – Vik Jan 06 '15 at 15:05
  • My last resort was inspired by another StackOverflow answer, I'm creating a temporary table with the ordered timestamps, so that I can rely on the rowid (since SQLite doesn't have ROW_NUMBER() support and make a less expensive inner query (instead of MAX(timestamp) ... I can just take the timestamp of the row with a specific rowid). Thanks for your support anyway! – Vik Jan 06 '15 at 15:06
1

I eventually went with this solution:

CREATE TABLE tmp AS SELECT timestamp FROM ExampleTable ORDER BY timestamp

SELECT ROUND(AVG(difference), 5) 
FROM (
  SELECT (
    SELECT A.timestamp - B.timestamp 
    FROM tmp as B 
    WHERE B.rowid = A.rowid-1
  ) as difference 
  FROM tmp as A 
  ORDER BY timestamp)

DROP TABLE ExampleTable

Actually I went further and I am only using this strategy for high number of rows (> 40k), since the other strategy (single query) works better for "small" tables.

Vik
  • 1,897
  • 12
  • 18