2

I'm trying to write some Oracle 11g SQL, but I'm running into a bit of a chicken-and-egg problem. I'm looking for spreadsheet-like behavior. I've found a solution that does use Oracle's MODEL clause, but the performance isn't great. So I'm wondering if a "non-MODEL" solution is even technically feasible.

Here's a toy example that demonstrates what I'm trying to do. Given this table:

CREATE TABLE t (id NUMBER PRIMARY KEY, n NUMBER);
INSERT INTO t (id, n) VALUES (2, 0);
INSERT INTO t (id, n) VALUES (3, 1);
INSERT INTO t (id, n) VALUES (5, 1);
INSERT INTO t (id, n) VALUES (7, 2);
INSERT INTO t (id, n) VALUES (11, 3);
INSERT INTO t (id, n) VALUES (13, 5);
INSERT INTO t (id, n) VALUES (17, 8);
INSERT INTO t (id, n) VALUES (19, 13);

I want to compute two additional, derived columns, call them X and Y.

Here are the rules for how X and Y are to be computed:

X: For the very first row, as defined by the minimum value of ID, set X to N. For all subsequent rows, the value of X should be one less than the value of the previous Y, as sorted by ID.

Y: Twice N plus X.

These next few steps show how I'd fill out my desired view if I were to do this by hand. First, the first few rows of the given data:

 ID    N    X    Y
---  ---  ---  ---
  2    0
  3    1
  5    1
  7    2
....

Since we're in the first row, X should be set to N, or 0. Y should be 2 * N + X, or 0.

 ID    N    X    Y
---  ---  ---  ---
  2    0    0    0
  3    1
  5    1
  7    2
....

Now, since we're not in the first row any longer, X should always be one less than the previous row's Y from here on out. Here in the second row, that means X = (previous Y) - 1 = 0 - 1 = -1. And the second row's Y will be 2 * N + X, or 2 * (1) + (-1) = 1.

 ID    N    X    Y
---  ---  ---  ---
  2    0    0    0
  3    1   -1    1
  5    1
  7    2
....

If you continue with the math, here's the desired outcome:

 ID    N    X    Y
---  ---  ---  ---
  2    0    0    0
  3    1   -1    1
  5    1    0    2
  7    2    1    5
 11    3    4   10
 13    5    9   19
 17    8   18   34
 19   13   33   59

Given the rules for how X and Y are computed, is it possible to get this outcome without having to resort to the MODEL clause?

I'm not looking for a mathematical simplification based on this particular example; this is just a toy example I came up with that demonstrates the kind of interdependence I'm facing in my actual problem.

P.S.: Here's a MODEL example I was able to cobble together that does generate this output; maybe there are modifications possible to improve performance?

SQL> WITH u AS (
  2      SELECT ROW_NUMBER() OVER (ORDER BY t.id) r
  3      ,      t.id
  4      ,      t.n
  5      FROM   t
  6  )
  7  SELECT r
  8  ,      id
  9  ,      n
 10  ,      x
 11  ,      y
 12  FROM   u
 13  MODEL
 14      DIMENSION BY (r)
 15      MEASURES (id
 16      ,         n
 17      ,         CAST(NULL AS NUMBER) x
 18      ,         CAST(NULL AS NUMBER) y) RULES AUTOMATIC ORDER
 19    ( x[1] = n[cv()]
 20    , y[r] = 2 * n[cv()] + x[cv()]
 21    , x[r > 1] ORDER BY r = y[cv() - 1] - 1
 22      )
 23  ;

         R         ID          N          X          Y
---------- ---------- ---------- ---------- ----------
         1          2          0          0          0
         2          3          1         -1          1
         3          5          1          0          2
         4          7          2          1          5
         5         11          3          4         10
         6         13          5          9         19
         7         17          8         18         34
         8         19         13         33         59

8 rows selected.

SQL>

Thanks.

Tebbe
  • 1,372
  • 9
  • 12

1 Answers1

2

You could use recursive subquery factoring (also known as a recursive CTE):

with tmp as (
  select t.*,
    row_number() over (order by t.id) as rn
  from t
),
r (id, n, x, y, rn) as (
  select id, n, 0, 0, rn
  from tmp
  where rn = 1
  union all
  select tmp.id, tmp.n, r.y - 1, (tmp.n * 2) + r.y - 1, tmp.rn
  from r
  join tmp on tmp.rn = r.rn + 1
)
select id, n, x, y
from r
order by rn;

        ID          N          X          Y
---------- ---------- ---------- ----------
         2          0          0          0 
         3          1         -1          1 
         5          1          0          2 
         7          2          1          5 
        11          3          4         10 
        13          5          9         19 
        17          8         18         34 
        19         13         33         59 

SQL Fiddle.

It's basically walking through your manual steps. The anchor member is your first manual step, setting x and y both to zero for the first row. The recursive member then does the calculation you specified. (You can't refer to the new-calculated x value when calculating that row's y, so you have to repeat that as (tmp.n * 2) + r.y - 1). The rn is just to keep the rows orders by ID while making it easier to find the next row - so you can look for rn + 1 instead of find the next highest ID value directly.

There's no significant performance difference with your sample data, but with a thousand rows added the model clause takes about 5 seconds and the recursive CTE takes about 1 second; with another thousand rows model takes ~20 seconds and the CTE takes ~3 seconds; with another thousand rows model took ~40 seconds and CTE took ~6 seconds; and with another thousand rows (so 4,008 in total) model took ~75 seconds and CTE took ~10 seconds. (I got bored waiting for the model version with more rows than that; killed it after a five minutes with 10,000). I can't really say how this will perform with your real data, but on that basis, it's probably worth trying.

Alex Poole
  • 183,384
  • 11
  • 179
  • 318
  • Hadn't heard of "recursive CTE" before. Worked like a champ, ran in 40% of the time! Thanks, Alex Poole! – Tebbe Feb 12 '15 at 21:17