0

I am working with GTFS data on Android (SQlite). And I would like to improve performance when I do select queries in my database filled with GTFS data.

The query below select the stop times associated to a route at a stop:

The first sub query gets the daily stop times on thursday. The second gets all the exception stop times which are not valid for TODAY (2013-07-25). The third one gets all the exception stop time which are only valid for TODAY (2013-07-25). Then I remove the non-valid one and add the valid one to the first sub query.

select distinct stop_times_arrival_time
from stop_times, trips, calendar
where stop_times_trip_id=trip_id
and calendar_service_id=trip_service_id
and trip_route_id='11821949021891616'
and stop_times_stop_id='3377699721872252'
and calendar_start_date<='20130725'
and calendar_end_date>='20130725'
and calendar_thursday=1
and stop_times_arrival_time>='07:40'

except

select stop_times_arrival_time
from stop_times, trips, calendar, calendar_dates
where stop_times_trip_id=trip_id
and calendar_service_id=trip_service_id
and calendar_dates_service_id = trip_service_id
and trip_route_id='11821949021891694'
and stop_times_stop_id='3377699720880977'
and calendar_thursday=1
and calendar_dates_exception_type=2
and stop_times_arrival_time > '07:40'
and calendar_dates_date = 20130725

union

select stop_times_arrival_time
from stop_times, trips, calendar, calendar_dates
where stop_times_trip_id=trip_id
and calendar_service_id=trip_service_id
and calendar_dates_service_id = trip_service_id
and trip_route_id='11821949021891694'
and stop_times_stop_id='3377699720880977'
and calendar_thursday=1
and calendar_dates_exception_type=1
and stop_times_arrival_time > '07:40'
and calendar_dates_date = 20130725;

It took about 15 seconds to compute (which is very long). I am sure there is a better to do this query since I do 3 different queries (almost the same by the way) which take time.

Any idea how to improve it?

EDIT: Here is the schema:

table|calendar|calendar|2|CREATE TABLE calendar (
    calendar_service_id TEXT PRIMARY KEY,
    calendar_monday INTEGER,
    calendar_tuesday INTEGER,
    calendar_wednesday INTEGER,
    calendar_thursday INTEGER,
    calendar_friday INTEGER,
    calendar_saturday INTEGER,
    calendar_sunday INTEGER,
    calendar_start_date TEXT,
    calendar_end_date TEXT
)
index|sqlite_autoindex_calendar_1|calendar|3|
table|calendar_dates|calendar_dates|4|CREATE TABLE calendar_dates (
        calendar_dates_service_id TEXT,
        calendar_dates_date TEXT,
        calendar_dates_exception_type INTEGER
)
table|routes|routes|8|CREATE TABLE routes (
        route_id TEXT PRIMARY KEY,
        route_short_name TEXT,
        route_long_name TEXT,
        route_type INTEGER,
        route_color TEXT
)
index|sqlite_autoindex_routes_1|routes|9|
table|stop_times|stop_times|12|CREATE TABLE stop_times (
        stop_times_trip_id TEXT,
        stop_times_stop_id TEXT,
        stop_times_stop_sequence INTEGER,
        stop_times_arrival_time TEXT,
        stop_times_pickup_type INTEGER
)
table|stops|stops|13|CREATE TABLE stops (
        stop_id TEXT PRIMARY KEY,
        stop_name TEXT,
        stop_lat REAL,
        stop_lon REAL
)
index|sqlite_autoindex_stops_1|stops|14|
table|trips|trips|15|CREATE TABLE trips (
        trip_id TEXT PRIMARY KEY,
        trip_service_id TEXT,
        trip_route_id TEXT,
        trip_headsign TEXT,
        trip_direction_id INTEGER,
        trip_shape_id TEXT
)
index|sqlite_autoindex_trips_1|trips|16|

And here is the query plan:

2|0|0|SCAN TABLE stop_times (~33333 rows)
2|1|1|SEARCH TABLE trips USING INDEX sqlite_autoindex_trips_1 (trip_id=?) (~1 rows)
2|2|2|SEARCH TABLE calendar USING INDEX sqlite_autoindex_calendar_1 (calendar_service_id=?) (~1 rows)
3|0|3|SCAN TABLE calendar_dates (~10000 rows)
3|1|2|SEARCH TABLE calendar USING INDEX sqlite_autoindex_calendar_1 (calendar_service_id=?) (~1 rows)
3|2|0|SEARCH TABLE stop_times USING AUTOMATIC COVERING INDEX (stop_times_stop_id=?) (~7 rows)
3|3|1|SEARCH TABLE trips USING INDEX sqlite_autoindex_trips_1 (trip_id=?) (~1 rows)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE (EXCEPT)
4|0|3|SCAN TABLE calendar_dates (~10000 rows)
4|1|2|SEARCH TABLE calendar USING INDEX sqlite_autoindex_calendar_1 (calendar_service_id=?) (~1 rows)
4|2|0|SEARCH TABLE stop_times USING AUTOMATIC COVERING INDEX (stop_times_stop_id=?) (~7 rows)
4|3|1|SEARCH TABLE trips USING INDEX sqlite_autoindex_trips_1 (trip_id=?) (~1 rows)
0|0|0|COMPOUND SUBQUERIES 1 AND 4 USING TEMP B-TREE (UNION)
vital
  • 1,308
  • 3
  • 13
  • 28
  • Please show the database schema and the output of [EXPLAIN QUERY PLAN](http://www.sqlite.org/eqp.html) for this query. – CL. Jul 25 '13 at 07:36
  • Sure, I just edited my post, I hope it will help. – vital Jul 25 '13 at 07:58

1 Answers1

0

Columns that are used for lookups should be indexed, but for a single (sub)query, it is not possible to use more than one index per table.

For this particular query, the following additional indexes would help:

CREATE INDEX some_index ON stop_times(
    stop_times_stop_id,
    stop_times_arrival_time);
CREATE INDEX some_other_index ON calendar_dates(
    calendar_dates_service_id,
    calendar_dates_exception_type,
    calendar_dates_date);
CL.
  • 173,858
  • 17
  • 217
  • 259