0

I am having trouble getting this to work. I am querying the S table to get a list of items and want to join the P table to pull in data for the same date as S + the preceding date, whenever that might have been using the LEAD function.

The issue is the P table is huge and takes forever to query unless a specific date, or small range is given. Since I am joining S & P by a date, the LEAD doesn't seem to work on pulling the prior date's data.

Is there another option? The preceding P.TDATE is often just the prior business date, but can sometimes be over a year ago.

SELECT S.ID,
       S.EDATE,
       S.PDATE,
       S.FACTOR,
       S.PTYPE,
       P.TDATE,
       P.PRICE,
       P.PTYPE,
       LEAD(P.TDATE) OVER (PARTITION BY P.ID
                           ORDER BY P.TDATE DESC) AS PRIOR_DATE,
       LEAD(P.PRICE) OVER (PARTITION BY P.ID
                           ORDER BY P.TDATE DESC) AS PRIOR_PRICE,
       LEAD(P.PTYPE) OVER (PARTITION BY P.ID
                           ORDER BY P.TDATE DESC) AS PRIOR_PTYPE
FROM S
LEFT JOIN P ON P.ID = S.ID
AND P.TDATE = S.EDATE
WHERE S.PERIOD = 'D'
  AND S.PTYPE IN ('A',
                  'G',
                  'Q',
                  'Y')
ORDER BY S.EDATE ;

I'm getting all the right data, except the prior_date, prior_price and prior_ptype are all blank when they shouldn't be.

There are no NULL values in the date fields.

Is there another efficient way of querying the P table based on the S.EDATE rather than having P.TDATE = S.EDATE in the left join? I believe this condition is the cause for nothing showing up on the "prior" columns due to the LEAD function having no other data to look at.

Thanks,

acvbasql
  • 109
  • 1
  • 2
  • 15
  • Is this a Typo: `ORDER BY S.EDATE,` that comma ? – Jorge Campos Oct 13 '15 at 16:54
  • Yes it was. I removed it. – acvbasql Oct 13 '15 at 16:56
  • since you are ordering by date in desc order NULL values will be first. if you include the NULL LAST rule in your function I think you will get your desired result.. `LEAD(P.TDATE) OVER (PARTITION BY P.ID ORDER BY P.TDATE DESC NULLS LAST) AS PRIOR_DATE,...` – knuckles Oct 13 '15 at 17:14
  • Btw, `LEAD... ORDER BY ... DESC` is the same as `LAG... ORDER BY ... ASC` – dnoeth Oct 13 '15 at 17:32
  • There are no null date values in any tables. So, ordering by P.TDATE DESC should give the previous date row. – acvbasql Oct 13 '15 at 17:37

4 Answers4

0

can you adjust the following query to reproduce the case

WITH a
     AS (SELECT 1 id, TO_DATE ('01-01-2011', 'mm-dd-yyyy') dat, 0 val
           FROM DUAL
         UNION ALL
         SELECT 1 id, TO_DATE ('01-02-2011', 'mm-dd-yyyy') dat, 1 val
           FROM DUAL
         UNION ALL
         SELECT 2 id, TO_DATE ('01-03-2011', 'mm-dd-yyyy') dat, 2 val
           FROM DUAL)
SELECT id, dat, val, LEAD (val) OVER (PARTITION BY id ORDER BY dat DESC) pre_val
  FROM a
clq
  • 180
  • 9
0

http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions004.htm#SQLRF06174

ASC | DESC Specify the ordering sequence (ascending or descending). ASC is the default.

NULLS FIRST | NULLS LAST Specify whether returned rows containing nulls should appear first or last in the ordering sequence.

NULLS LAST is the default for ascending order, and NULLS FIRST is the default for descending order.

Analytic functions always operate on rows in the order specified in the order_by_clause of the function. However, the order_by_clause of the function does not guarantee the order of the result. Use the order_by_clause of the query to guarantee the final result ordering.

    SELECT S.ID,
           S.EDATE,
           S.PDATE,
           S.FACTOR,
           S.PTYPE,
           P.TDATE,
           P.PRICE,
           P.PTYPE,
           LEAD(P.TDATE) OVER (PARTITION BY P.ID
                               ORDER BY P.TDATE DESC NULLS LAST) AS PRIOR_DATE,
           LEAD(P.PRICE) OVER (PARTITION BY P.ID
                               ORDER BY P.TDATE DESC NULLS LAST) AS PRIOR_PRICE,
           LEAD(P.PTYPE) OVER (PARTITION BY P.ID
                               ORDER BY P.TDATE DESC NULLS LAST) AS PRIOR_PTYPE
    FROM S
    LEFT JOIN P ON P.ID = S.ID
    AND P.TDATE = S.EDATE
    WHERE S.PERIOD = 'D'
      AND S.PTYPE IN ('A',
                      'G',
                      'Q',
                      'Y')
    ORDER BY S.EDATE ;
knuckles
  • 101
  • 2
  • There are no null date values in any tables. I think I'm not getting any results for the LEAD (Prior) columns is due to the join on an exact date, so the lead function has no prior date to look at. I was hoping for an efficient way of finding the prior row since the P table takes a long time to query on large date ranges. – acvbasql Oct 13 '15 at 17:39
0

If you get NULLs due to the LEFT JOIN you might try to add the IGNORE NULLS option to LEAD:

       LEAD(P.TDATE IGNORE NULLS) OVER (PARTITION BY P.ID
                           ORDER BY P.TDATE DESC NULLS LAST) AS PRIOR_DATE,

which is the same as

       LAG(P.TDATE IGNORE NULLS) OVER (PARTITION BY P.ID
                           ORDER BY P.TDATE) AS PRIOR_DATE,

or move the calculation to a Derived Table/Inline View:

SELECT S.ID,
       S.EDATE,
       S.PDATE,
       S.FACTOR,
       S.PTYPE,
       P.TDATE,
       P.PRICE,
       P.PTYPE,
       P.PRIOR_DATE,
       P.PRIOR_PRICE,
       P.PRIOR_PTYPE
FROM S
LEFT JOIN
 ( SELECT ...,
       LEAD(P.TDATE) OVER (PARTITION BY P.ID
                           ORDER BY P.TDATE DESC NULLS LAST) AS PRIOR_DATE,
       LEAD(P.PRICE) OVER (PARTITION BY P.ID
                           ORDER BY P.TDATE DESC NULLS LAST) AS PRIOR_PRICE,
       LEAD(P.PTYPE) OVER (PARTITION BY P.ID
                           ORDER BY P.TDATE DESC NULLS LAST) AS PRIOR_PTYPE

   FROM P 
 ) P ON P.ID = S.ID
AND P.TDATE = S.EDATE
WHERE S.PERIOD = 'D'
  AND S.PTYPE IN ('A',
                  'G',
                  'Q',
                  'Y')
ORDER BY S.EDATE ;
dnoeth
  • 59,503
  • 4
  • 39
  • 56
  • I believe this would eventually work, but it is taking 30+ minutes to run. The P table takes quite a while to pull in all the data since it's pulling everything for each ID and then limiting it to the S.EDATE – acvbasql Oct 13 '15 at 18:29
0

Think of this in narrative form: you have a row in S. You want to find two rows in P:

  1. The P row which matches S ID & EDATE directly.
  2. The row which immediately precedes the first P row you found.

You'll actually have to find that second P row based on the first P row - it can't just magically "be there" for you to walk backward to with the correct incarnation of LEAD().

Looking at it that way, you are trying to find the #2P row with the maximum TDATE that is less than the TDATE of the #1P row. dnoeth's answer is one approach to this - but as you saw, it's very slow (he's actually building out the full history of all P rows matching any ID in S - if not for every ID in P - then using that constructed history to "pluck" the #2P row after finding the #1P row).

This should be a better approach, especially if P is indexed on both ID and TDATE:

SELECT T.*  , P3.PRICE PRIOR_PRICE, P3.PTYPE PRIOR_PTYPE
FROM 
    (
    SELECT S.ID,
           S.EDATE,
           S.PDATE,
           S.FACTOR,
           S.PTYPE S_PTYPE,
           P.TDATE,
           P.PRICE,
           P.PTYPE P_PTYPE,
           (SELECT MAX(P2.TDATE)  FROM P P2 WHERE P2.ID = P.ID AND P2.TDATE < P.TDATE) PRIOR_TDATE
    FROM S
    LEFT JOIN P ON P.ID = S.ID
    AND P.TDATE = S.EDATE
    WHERE S.PERIOD = 'D'
      AND S.PTYPE IN ('A','G','Q','Y')
    ) T
LEFT JOIN P P3 ON P3.ID = T.ID AND P3.TDATE = T.PRIOR_TDATE
ORDER BY T.EDATE ;
KevinKirkpatrick
  • 1,436
  • 1
  • 10
  • 15