0

I have the below table and I would like to find out which customers made a trip where his/her start_location is the end_location of another customer who made the trip <= 5 minutes before him/her.

For instance, this is what I have:

DT                    Customer_Name       Start_location   End_location    Trip_fare
2019-11-01 08:17:42   Jane                  A                 B            $10
2019-11-01 08:18:02   Mary                  C                 A            $7
2019-11-01 08:18:04   Tom                   B                 D            $12
2019-11-01 08:20:11   Harry                 E                 C            $20
2019-11-01 08:21:22   Alex                  D                 A            $5
2019-11-01 08:24:30   Sally                 C                 B            $8

This is what I want:

DT                    Customer_Name    Start_location   End_location   
2019-11-01 08:17:42   Jane              A                 B
2019-11-01 08:18:04   Tom               B                 D  (cause Tom's start_location = B = Jane's end_location and the time difference between the 2 trips is within 5 minutes)
2019-11-01 08:21:22   Alex              D                 A
2019-11-01 08:20:11   Harry             E                 C 
2019-11-01 08:24:30   Sally             C                 B

Here, Mary has been removed from the list as her start_location = 'C', which is not the end_location of Jane who made a trip <= 5 minutes before her.

My apologies for this 'messy' looking question. Do let me know if you need further clarifications!

Thank you so much for your help!

wolφi
  • 8,091
  • 2
  • 35
  • 64
hAPPyqirL
  • 101
  • 1
  • 6

2 Answers2

1

As your query relates to customers from the same table, you'll need a self join. That is, you join the table with itself.

SELECT ... FROM mytable JOIN mytable ...

To distinguish one "instance" of the table from the other instance, you'll need alias names:

SELECT ... FROM mytable t1 JOIN mytable t2 ...

And you need join conditions, that is how your two customers are related. In your example this is quite straightforward:

SELECT tcust.name  AS name, 
       tother.name AS other_name
  FROM mytable tcust 
  JOIN mytable tother
    ON tcust.start_loc = tother.end_loc
   AND tcust.dt       >= tother.dt - INTERVAL '5' MINUTE;

However, this query gets a slightly different result. Can you find out why?

CREATE TABLE mytable (
  dt DATE, name VARCHAR2(30 CHAR), start_loc VARCHAR2(5 CHAR), 
  end_loc VARCHAR2(5 CHAR), fare NUMBER);

INSERT INTO mytable VALUES (TIMESTAMP '2019-11-01 08:17:42', 'Jane',  'A', 'B', 10);
INSERT INTO mytable VALUES (TIMESTAMP '2019-11-01 08:18:02', 'Mary',  'C', 'A', 7);
INSERT INTO mytable VALUES (TIMESTAMP '2019-11-01 08:18:04', 'Tom',   'B', 'D', 12);
INSERT INTO mytable VALUES (TIMESTAMP '2019-11-01 08:20:11', 'Harry', 'E', 'C', 20);
INSERT INTO mytable VALUES (TIMESTAMP '2019-11-01 08:21:22', 'Alex',  'D', 'A',  5);
INSERT INTO mytable VALUES (TIMESTAMP '2019-11-01 08:24:30', 'Sally', 'C', 'B',  8);

Result:

NAME   OTHER_NAME
Tom    Jane 
Jane   Mary 
Alex   Tom  
Mary   Harry
Sally  Harry
Jane   Alex

The substraction of 5 Minutes is explained in this question.

wolφi
  • 8,091
  • 2
  • 35
  • 64
  • I think we can get closer to the desired result by a `CONNECT BY` clause, starting with customers who do not have a predecessor and connecting by target and time. But I am too tired to figure out the fine details right now :/ – Erich Kitzmueller May 26 '20 at 06:43
1

I have the below table and I would like to find out which customers made a trip where his/her start_location is the end_location of another customer who made the trip <= 5 minutes before him/her.

Your description of the problem suggests not exists:

select t.*
from t
where not exists (select 1
                  from t t2
                  where t2.end_loc = t.start_loc and
                        t2.dt < t.dt and
                        t2.dt >= t.dt - interval '5' minute
                 );

However, this removes Tom, Alex, and Sally. From how you describe the question, I think this is correct.

Here is a db<>fiddle.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786