I have a table in SQLite dumped from a CSV logging file with 20-50 columns, up to 500k rows. I would like to be able to query various columns in a few ways: instantaneous value, moving average value, and slope. I was contemplating adding columns to the table for the moving averages and slopes, but I thought I would investigate the query method first. I found some moving average help and slope help from SO which got me started.
To test out what I'm trying to do, I created the following SQL script:
DROP TABLE IF EXISTS TEST;
CREATE TABLE TEST (ID INTEGER UNIQUE PRIMARY KEY ASC, c1, c2);
INSERT INTO TEST (c1, c2) VALUES (1,5), (7,10), (11, 15), (5, 20), (8, 25), (10, 30), (12, 35), (15, 40), (18, 45), (20, 50);
--Create VIEW for Moving average (3 rows) of column c1
DROP VIEW IF EXISTS AVG_VIEW;
CREATE TEMP VIEW IF NOT EXISTS AVG_VIEW AS
SELECT * FROM TEST AS t,
(SELECT t1.ID, AVG(t2.c1) AS mavg
FROM TEST AS t1, TEST AS t2
WHERE t2.ID BETWEEN (t1.ID-2) AND t1.ID
GROUP BY t1.ID) sq
WHERE (t.ID = sq.ID);
--Create view for slope of 2 consecutive moving averages (just use dif for now)
DROP VIEW IF EXISTS SLOPE_VIEW;
CREATE TEMP VIEW IF NOT EXISTS SLOPE_VIEW AS
SELECT a1.*, (a1.mavg - a2.mavg) AS slope
FROM AVG_VIEW AS a1, AVG_VIEW AS a2
WHERE a2.ID = (a1.ID-1)
GROUP BY a1.ID;
SELECT * FROM SLOPE_VIEW;
This gives the expected result:
ID c1 c2 ID:1 mavg slope
2 7 10 2 4 3
3 11 15 3 6.3333 2.333
4 5 20 4 7.6667 1.333
5 8 25 5 8 0.333
6 10 30 6 7.6667 -0.333
7 12 35 7 10 2.333
8 15 40 8 12.333 2.333
9 18 45 9 15 2.667
10 20 50 10 17.667 2.667
I ran "EXPLAIN QUERY PLAN" on the SELECT statement, but I couldn't figure out how to interpret it:
selectid order from detail
1 0 0 SCAN TABLE TEST AS t1 USING COVERING INDEX sqlite_autoindex_TEST_1
1 1 1 SEARCH TABLE TEST AS t2 USING INTEGER PRIMARY KEY (rowid>? AND rowid<?)
2 0 0 SCAN TABLE TEST AS t1 USING COVERING INDEX sqlite_autoindex_TEST_1
2 1 1 SEARCH TABLE TEST AS t2 USING INTEGER PRIMARY KEY (rowid>? AND rowid<?)
0 0 1 SCAN SUBQUERY 1 AS sq
0 1 0 SEARCH TABLE TEST AS t USING INTEGER PRIMARY KEY (rowid=?)
0 2 2 SEARCH TABLE TEST AS t USING INTEGER PRIMARY KEY (rowid=?)
0 0 0 EXECUTE LIST SUBQUERY 3
0 3 3 SEARCH SUBQUERY 2 AS sq USING AUTOMATIC COVERING INDEX (ID=?)
0 0 0 USE TEMP B-TREE FOR GROUP BY
My question is: is this a really bad plan/query? Can you explain the query-plan? Should I be storing one or both of the average/slope in the table? Is there a better way to accomplish this?