0

I am writing a SQL with Oracle Client 12 driver. I have two tables simplified as appended, and I want to get a table with following logic. The "B.TIME_B <= A0.TIME_A" seems created massive joining and made the query very slow. Please help to find best solution.

WITH A0 AS (
  SELECT *
  FROM A
  WHERE A.EVENT = 'a0'
)

SELECT * FROM (
SELECT
  ROW_NUMBER() OVER (PARTITION BY A0.TIME_A0 ORDER BY B.TIME_B DESC) RN, 
  A0.*,
  B.*
FROM 
   A0,B
WHERE
   B.TIME_B <= A0.TIME_A) B0

WHERE B0.RN <= 3   
  1. Find TIME_A, where EVENT_A = 'a0', as TIME_A0,

  2. Find TIME_B = TIME_A0, as EVENT_B0,

  3. And then get the row and previous 2 rows of table B, where EVENT_B0 found. N in this example is 3, and M is 2, but in real case both number are over 3000, so efficiency will be appreciated.

TableA

TIME_A  EVENT_A
1       a1
2       a1
3       a1
4       a0
5       a2
6       a2
7       a3
8       a0

Table B

TIME_B  EVENT_B
1       b1
2       b2
3       b3
4       b4
5       b5
6       b5
7       b6
8       b7

JOIN A_B

TIME_A  EVENT_A TIME_B  EVENT_B
4       a0      2       b2
4       a0      3       b3
4       a0      4       b4
8       a0      6       b5
8       a0      7       b6
8       a0      8       b7
ju.
  • 1,016
  • 1
  • 13
  • 34
  • 1
    Hi ju. a sample output would be very helpful. – Francisco Sitja Jun 21 '19 at 19:06
  • 1
    So your expected output in this case is the records from Table B with event_b in b2,b3,b4 and b5,b6,b7? – Error_2646 Jun 21 '19 at 19:12
  • @FranciscoSitja added example of output. – ju. Jun 21 '19 at 19:14
  • @Error_2646 yes, correct. – ju. Jun 21 '19 at 19:16
  • Hi Ju. you mentioned "M is 2" above. Is M the number of distinct TIME_A events selected (in this case 4 and 8) by using event='a0'? If it is, then we are talking about something that looks like a cross-join or cartesian product by its requirements, it would explain why performance is degraded. – Francisco Sitja Jun 21 '19 at 20:08
  • Side question: by chance is there any other column in the real tables we could use to "join" or match tables A to B? Idea is we would try to filter out as many rows as possible from Table B that would not match table A anyway, not solely on the TIME_A and TIME_B columns. – Francisco Sitja Jun 21 '19 at 20:17
  • @FranciscoSitja Yes, that is the struggle I am having now, the only thing that I can think of to limit the whole Table B search backward match is Row counts limit, but I could not find any way to implement that. The problem is the time range in Table B is not necessarily match the row counts. Although I could use a large size of time window, B.TIME_B <= A0.TIME_A AND B.TIME_B > A0.TIME_A - 10 for example, it might still have rare chance that might miss some matches. – ju. Jun 21 '19 at 20:41
  • Hi @FranciscoSitja, to answer the first question, M is the row counts. – ju. Jun 21 '19 at 20:43

2 Answers2

1

Query 1:

If you are not going to have overlapping ranges then you can use:

SELECT *
FROM   (
  SELECT TIME_B,
         EVENT_B,
         MAX( TIME_A  ) OVER ( ORDER BY TIME_B ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING )
           AS TIME_A,
         MAX( EVENT_A ) OVER ( ORDER BY TIME_B ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING )
           AS EVENT_A
  FROM   tableB B
         LEFT OUTER JOIN tableA A
         ON ( B.TIME_B = A.TIME_A AND A.EVENT_A = 'a0' )
)
WHERE  TIME_A IS NOT NULL;

Which only uses a single join and then finds the valid rows with analytic functions.

Output:

TIME_B | EVENT_B | TIME_A | EVENT_A
-----: | :------ | -----: | :------
     2 | b2      |      4 | a0     
     3 | b3      |      4 | a0     
     4 | b4      |      4 | a0     
     6 | b5      |      8 | a0     
     7 | b6      |      8 | a0     
     8 | b7      |      8 | a0     

db<>fiddle here


Query 2:

If you could have overlapping ranges then you could use a hierarchical query to generate the rows:

SELECT TIME_B,
       EVENT_B,
       CONNECT_BY_ROOT( TIME_A ) AS TIME_A,
       CONNECT_BY_ROOT( EVENT_A ) AS EVENT_A
FROM   (
  SELECT A.*,
         B.*,
         ROW_NUMBER() OVER ( ORDER BY TIME_B ) AS rn
  FROM   tableB B
         LEFT OUTER JOIN tableA A
         ON ( B.TIME_B = A.TIME_A AND A.EVENT_A = 'a0' )
)
WHERE LEVEL <= 2
START WITH EVENT_A IS NOT NULL
CONNECT BY PRIOR rn -2 <= rn AND rn < PRIOR rn
ORDER BY time_a, time_b

Output:

TIME_B | EVENT_B | TIME_A | EVENT_A
-----: | :------ | -----: | :------
     2 | b2      |      4 | a0     
     3 | b3      |      4 | a0     
     4 | b4      |      4 | a0     
     6 | b5      |      8 | a0     
     7 | b6      |      8 | a0     
     8 | b7      |      8 | a0     
     8 | b7      |     10 | a0     
     9 | b8      |     10 | a0     
    10 | b9      |     10 | a0     

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
  • Your answer is very insightful, but could you elaborate the logic of each methods? Actually, the example that I gave is over-simplified, and I am trying to figure out how to use Max() since my Event_B is partitioned by another column ID. The previous row search need to be within same IDs. – ju. Jun 24 '19 at 17:55
  • @ju. If you want to partition by another column then just add that clause `MAX( TIME_A ) OVER ( PARTITION BY other_column ORDER BY TIME_B ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING )`. Do that for each `MAX` or `ROW_NUMBER` analytic function. – MT0 Jun 24 '19 at 20:32
0

This can be achieved using a simple join. No need to use any functions.

Try the following code, if TIME_A and TIME_B are continuous:

WITH tableA  ( TIME_A, EVENT_A ) AS
  (SELECT  1, 'a1' FROM DUAL UNION ALL
  SELECT  2, 'a1' FROM DUAL UNION ALL
  SELECT  3, 'a1' FROM DUAL UNION ALL
  SELECT  4, 'a0' FROM DUAL UNION ALL
  SELECT  5, 'a2' FROM DUAL UNION ALL
  SELECT  6, 'a2' FROM DUAL UNION ALL
  SELECT  7, 'a3' FROM DUAL UNION ALL
  SELECT  8, 'a0' FROM DUAL),
 tableB ( TIME_B, EVENT_B ) AS
  (SELECT  1, 'b1' FROM DUAL UNION ALL
  SELECT  2, 'b2' FROM DUAL UNION ALL
  SELECT  3, 'b3' FROM DUAL UNION ALL
  SELECT  4, 'b4' FROM DUAL UNION ALL
  SELECT  5, 'b5' FROM DUAL UNION ALL
  SELECT  6, 'b5' FROM DUAL UNION ALL
  SELECT  7, 'b6' FROM DUAL UNION ALL
  SELECT  8, 'b7' FROM DUAL)
 SELECT
    TIME_A,
    EVENT_A,
    TIME_B,
    EVENT_B
FROM
    TABLEA A
    JOIN TABLEB B ON ( EVENT_A = 'a0'
                       AND TIME_B BETWEEN TIME_A - 2 AND TIME_A )
ORDER BY
    TIME_A,
    TIME_B

Try the following code, if TIME_A and TIME_B are not continuous:

WITH tableA  ( TIME_A, EVENT_A ) AS
  (SELECT  1, 'a1' FROM DUAL UNION ALL
  SELECT  2, 'a1' FROM DUAL UNION ALL
  SELECT  3, 'a1' FROM DUAL UNION ALL
  SELECT  4, 'a0' FROM DUAL UNION ALL
  SELECT  5, 'a2' FROM DUAL UNION ALL
  SELECT  6, 'a2' FROM DUAL UNION ALL
  SELECT  7, 'a3' FROM DUAL UNION ALL
  SELECT  8, 'a0' FROM DUAL),
 tableB ( TIME_B, EVENT_B ) AS
  (SELECT  1, 'b1' FROM DUAL UNION ALL
  SELECT  2, 'b2' FROM DUAL UNION ALL
  SELECT  3, 'b3' FROM DUAL UNION ALL
  SELECT  4, 'b4' FROM DUAL UNION ALL
  SELECT  5, 'b5' FROM DUAL UNION ALL
  SELECT  6, 'b5' FROM DUAL UNION ALL
  SELECT  7, 'b6' FROM DUAL UNION ALL
  SELECT  8, 'b7' FROM DUAL)

 SELECT
    TIME_A,
    EVENT_A,
    TIME_B,
    EVENT_B FROM
 (SELECT
    TIME_A,
    EVENT_A,
    TIME_B,
    EVENT_B,
    ROW_NUMBER() OVER (PARTITION BY TIME_A ORDER BY TIME_B DESC NULLS LAST) AS RN
FROM
    TABLEA A
    JOIN TABLEB B ON ( EVENT_A = 'a0'
                       AND TIME_B <= TIME_A ))
WHERE RN <= 3
ORDER BY
    TIME_A,
    TIME_B

DB Fiddle demo

Cheers!!

Popeye
  • 35,427
  • 4
  • 10
  • 31