3

I am using the below table:

1   0051ML66220600132482    06:00:00        06:00:00        1538    100 0   1
2   0051ML66220600132482    06:00:00        06:00:00        1540    200 0   0
3   0051ML66220600132482    06:00:00        06:00:00        1541    300 0   0
4   0051ML66220600132482    06:01:00        06:01:00        1542    400 0   0
5   0051ML66220600132482    06:01:00        06:01:00        1543    500 0   0
6   0051ML66220600132482    06:02:00        06:02:00        1544    600 0   0
7   0051ML66220600132482    06:03:00        06:03:00        1546    700 0   0

Table structure us like below:

> ------------------------------------------------------------------
> --  TABLE stop_times
> ------------------------------------------------------------------
> 
> CREATE TABLE stop_times ( id int(12),
>                           trip_id varchar(100),
>                           arrival_time varchar(8),
>                           arrival_time_seconds int(100),
>                           departure_time varchar(8),
>                           departure_time_seconds int(100),
>                           stop_id varchar(100),
>                           stop_sequence varchar(100),
>                           pickup_type varchar(2),
>                           drop_off_type varchar(2) );

I am attempting to get DISTINCT trip_id is it matches both the destination and arrival stop_id

I have tried the below SQL with no luck:

select DISTINCT trip_id from stop_times where stop_id=1538 AND stop_id =1540;

Where is should produce: 0051ML66220600132482

I have also tried a INNER JOIN SQL like below:

SELECT 
       t.trip_id,
       start_s.stop_name as departure_stop,
       end_s.stop_name as arrival_stop
FROM
trips t 
        INNER JOIN stop_times start_st ON t.trip_id = start_st.trip_id
        INNER JOIN stops start_s ON start_st.stop_id = start_s.stop_id
        INNER JOIN stop_times end_st ON t.trip_id = end_st.trip_id
        INNER JOIN stops end_s ON end_st.stop_id = end_s.stop_id
WHERE 
   start_s.stop_id = 1538 
  AND end_s.stop_id = 1540;

But it is sooo slow, and takes about 8-15 seconds for this simple query.

Explain Added:

enter image description here

What is the fastest/Best way to make this query?

Yonkee
  • 1,781
  • 3
  • 31
  • 56

2 Answers2

3

So in other words, you are looking for a query that will identify all the trips that pass through a pair of stops, the origin (starting point) and destination (ending point).

Try this query:

SELECT destination.trip_id
    FROM stop_times AS origin
    INNER JOIN stop_times AS destination
        ON destination.trip_id = origin.trip_id
        AND destination.stop_id = 1540
    WHERE origin.stop_id = 1538
        AND origin.stop_sequence < destination.stop_sequence;

Or, for a prettier view (and to match the second query in your question):

SELECT destination.trip_id, origin_stop.name, destination_stop.name
    FROM stop_times AS origin
    INNER JOIN stop_times AS destination
        ON destination.trip_id = origin.trip_id
        AND destination.stop_id = 1540
    INNER JOIN stops AS origin_stop
        ON origin_stop.id = origin.stop_id
    INNER JOIN stops AS destination_stop
        ON destination_stop.id = destination.stop_id
    WHERE origin.stop_id = 1538
        AND origin.stop_sequence < destination.stop_sequence;

For good performance, create an index first on stop_id and trip_id:

CREATE INDEX stop_times_stop_id_trip_id_index ON stop_times(stop_id, trip_id);

(Note that EternalHour's query identifies all the trips that pass through either stop, not only trips that pass through one first and then the other.)

  • You right, Ethernals query doesn't work correctly. I need a trip that passes through both stops. Your works great, but it is slow. Even with the index – Yonkee May 02 '15 at 00:57
1

Seems like this is the query you need. I took out DISTINCT and replaced it with GROUP BY and also replaced the WHERE with IN. Your query indicates that stop_id should be of type INT, not varchar since you had not added quotes to it, the fiddle reflects that.

The IN basically does an OR instead of an AND. Nothing will be returned with AND since both stop_id's don't exist on the same row.

SELECT trip_id 
FROM stop_times 
WHERE stop_id IN(1538,1540)
GROUP BY trip_id

Here is a SQLFiddle

EternalHour
  • 8,308
  • 6
  • 38
  • 57