3

I am searching the most efficient way to make a relatively complicated query in a relatively large table.

The concept is that:

  • I have a table that holds records of phases that can run parallel to each other
  • The amount of records exceeds the 5 millions (and increases)
  • The time period starts about 5 years ago
  • Due to performance reasons, this select could be applied on the last 3 months period of time with 300.000 records (only if it is not physically possible to do it for the whole table)
  • Oracle version: 11g

The data sample seems as following

Table Phases (ID, START_TS, END_TS, PRIO)

1  10:00:00 10:20:10 10
2  10:05:00 10:10:00 11
3  10:05:20 10:15:00 9
4  10:16:00 10:25:00 8
5  10:24:00 10:45:15 1
6  10:26:00 10:30:00 10
7  10:27:00 10:35:00 15
8  10:34:00 10:50:00 5
9  10:50:00 10:55:00 20
10 10:55:00 11:00:00 15

Above you can see how the information is currently stored (of course there are several other columns with irrelevant information).

There are two requirements (or problems to be solved)

  1. If we sum the duration of all the phases, the result is MUCH more than an hour that the above data represent. (There could be holes between the phases, so taking the first start_ts and the last end_ts would not be sufficient).
  2. The data should be displayed in a form that it would be visible which phases run parallel with which and which phase had the highest priority at each time, as shown in the expected view below

Here it is easy to distinct the highest priority phase at each time (HIGHEST_PRIO), and adding their duration would result the actual total duration.

View V_Parallel_Phases (ID, START_TS, END_TS, PRIO, HIGHEST_PRIO) 
-> Optional Columns: Part_of_ID / Runs_Parallel

1  10:00:00 10:05:20 10 True  (--> Part_1 / False)
1  10:05:20 10:15:00 10 False (--> Part_2 / True)
2  10:05:00 10:10:00 11 False (--> Part_1 / True)
3  10:05:20 10:15:00  9 True  (--> Part_1 / True)
1  10:15:00 10:16:00 10 True  (--> Part_3 / True)
1  10:16:00 10:20:10 10 False (--> Part_4 / True)
4  10:16:00 10:24:00  8 True  (--> Part_1 / True)
4  10:24:00 10:25:00  8 False (--> Part_2 / True)
5  10:24:00 10:45:15  1 True  (--> Part_1 / True)
6  10:26:00 10:30:00 10 False (--> Part_1 / True)
7  10:27:00 10:35:00 15 False (--> Part_1 / True)
8  10:34:00 10:45:15  5 False (--> Part_1 / True)
8  10:45:15 10:50:00  5 True  (--> Part_2 / True)
9  10:50:00 10:55:00 20 True  (--> Part_2 / False)
10 10:55:00 11:00:00 15 True  (--> Part_2 / False)

Unfortunately I am not aware of an efficient way to make this query. The current solution was to make the above calculations programmatically in the tool that generates a large report but it was a total failure. From the 30 seconds that were needed before this calculations, now it needs over 10 minutes without taking event into consideration the priorities of the phases..

Then I thought of translating this code into sql in either: a) a view b) a materialized view c) a table that I would fill with a procedure once in a while (depending on the required duration).

PS: I am aware that oracle has some analytical functions that can handle complicated queries but I am not aware of which could actually help me in the current problem.

Thank you in advance!

Stef
  • 143
  • 1
  • 10
  • Hey there.. If you are still working on this, contact me off-line ( wjs@berkeley.edu) as this is a bit complex for this forum. I need to understand more about your data and what you mean by phases, etc. – FlyingGuy Mar 21 '15 at 04:36
  • I can write a query that does this, but I have no idea whether it is going to be fast or not. – mzedeler Mar 22 '15 at 20:21

1 Answers1

0

This is an incomplete answer, but I need to know if this approach is viable before going on. I believe it is possible to do completely in SQL, but I am not sure how the performance will be.

First find out all points in time where there is a transition:

CREATE VIEW Events AS
    SELECT START_TS AS TS
      FROM Phases
    UNION
    SELECT END_TS AS TS
      FROM Phases
;

Then create (start, end) tuples from those points in time:

CREATE VIEW Segments AS
     SELECT START.TS AS START_TS,
            MIN(END.TS) AS END_TS
      FROM Events AS START
      JOIN Events AS END
     WHERE START.TS < END.TS
;

From here on, doing the rest should be fairly straight forward. Here is a query that lists the segments and all the phases that are active in the given segment:

  SELECT *
    FROM Segments
    JOIN Phases
   WHERE Segments.START_TS BETWEEN Phases.START_TS AND Phases.END_TS
     AND Segments.END_TS BETWEEN Phases.START_TS AND Phases.END_TS
ORDER BY Segments.START_TS
;

The rest can be done with subselects and some aggregates.

| START_TS |   END_TS | ID | START_TS |  END_TS  | PRIO |
|----------|----------|----|----------|----------|------|
| 10:00:00 | 10:05:00 |  1 | 10:00:00 | 10:20:10 |   10 |
| 10:05:00 | 10:05:20 |  1 | 10:00:00 | 10:20:10 |   10 |
| 10:05:00 | 10:05:20 |  2 | 10:05:00 | 10:10:00 |   11 |
| 10:05:20 | 10:10:00 |  1 | 10:00:00 | 10:20:10 |   10 |
| 10:05:20 | 10:10:00 |  2 | 10:05:00 | 10:10:00 |   11 |
| 10:05:20 | 10:10:00 |  3 | 10:05:20 | 10:15:00 |    9 |
| 10:10:00 | 10:15:00 |  1 | 10:00:00 | 10:20:10 |   10 |
| 10:10:00 | 10:15:00 |  3 | 10:05:20 | 10:15:00 |    9 |
| 10:15:00 | 10:16:00 |  1 | 10:00:00 | 10:20:10 |   10 |
| 10:16:00 | 10:20:10 |  1 | 10:00:00 | 10:20:10 |   10 |
| 10:16:00 | 10:20:10 |  4 | 10:16:00 | 10:25:00 |    8 |
| 10:20:10 | 10:24:00 |  4 | 10:16:00 | 10:25:00 |    8 |
| 10:24:00 | 10:25:00 |  4 | 10:16:00 | 10:25:00 |    8 |
| 10:24:00 | 10:25:00 |  5 | 10:24:00 | 10:45:15 |    1 |
| 10:25:00 | 10:26:00 |  5 | 10:24:00 | 10:45:15 |    1 |
| 10:26:00 | 10:27:00 |  5 | 10:24:00 | 10:45:15 |    1 |
| 10:26:00 | 10:27:00 |  6 | 10:26:00 | 10:30:00 |   10 |
| 10:27:00 | 10:30:00 |  5 | 10:24:00 | 10:45:15 |    1 |
| 10:27:00 | 10:30:00 |  6 | 10:26:00 | 10:30:00 |   10 |
| 10:27:00 | 10:30:00 |  7 | 10:27:00 | 10:35:00 |   15 |
| 10:30:00 | 10:34:00 |  5 | 10:24:00 | 10:45:15 |    1 |
| 10:30:00 | 10:34:00 |  7 | 10:27:00 | 10:35:00 |   15 |
| 10:34:00 | 10:35:00 |  8 | 10:34:00 | 10:50:00 |    5 |
| 10:34:00 | 10:35:00 |  5 | 10:24:00 | 10:45:15 |    1 |
| 10:34:00 | 10:35:00 |  7 | 10:27:00 | 10:35:00 |   15 |
| 10:35:00 | 10:45:15 |  5 | 10:24:00 | 10:45:15 |    1 |
| 10:35:00 | 10:45:15 |  8 | 10:34:00 | 10:50:00 |    5 |
| 10:45:15 | 10:50:00 |  8 | 10:34:00 | 10:50:00 |    5 |
| 10:50:00 | 10:55:00 |  9 | 10:50:00 | 10:55:00 |   20 |
| 10:55:00 | 11:00:00 | 10 | 10:55:00 | 11:00:00 |   15 |

There is a SQL fiddle demonstrating the whole thing here:

http://sqlfiddle.com/#!9/d801b/2

mzedeler
  • 4,177
  • 4
  • 28
  • 41
  • I think the `BETWEEN` stuff can be simplified. `Segments.START_TS >= Phase.START_TS AND Segments.END_TS <= Phase.END_TS` should be sufficient. – mzedeler Mar 22 '15 at 20:58