3

suppose I have a dataset like this (in an Oracle 11g db environment)

CHANGE_DATE             VALUE
------------------ ----------
03-NOV-13 06.56.01    3027.97
03-NOV-13 06.57.01    3030.59
03-NOV-13 06.58.01    3032.33
03-NOV-13 06.59.01    3047.41
03-NOV-13 07.00.02    3045.82
03-NOV-13 07.01.01    3046.63
03-NOV-13 07.02.01    3020.29
03-NOV-13 07.03.02    3019.38
03-NOV-13 07.04.01    3020.76
03-NOV-13 07.05.01    3008.53

what I would be interested in is a select statement which only displays values on a sufficiently large change, e.g. 0.1%. In the above dataset the desired output would be

03-NOV-13 06.56.01  3027.97
03-NOV-13 06.58.01  3032.33
03-NOV-13 06.59.01  3047.41
03-NOV-13 07.04.01  3020.29
03-NOV-13 07.05.01  3008.53

EDIT: to explain the aim: the first row is the first reference value. Any subsequent row value should be compared to this. If the change with respect to the reference values does not exceed x%, continue. If the value does exceed the threshold, select this row and keep this new value as the reference to compare the next rows to.

I know how I can achieve something like this in case I am only flipping between integer values along the lines of what has been discussed here: Select rows where column value has changed

I tried to implement something along those lines using:

with t as (
select to_date('03-NOV-13 06.56.01','dd/mm/yyyy hh24:mi:ss') change_date, 3027.97 value from dual union all
select to_date('03-NOV-13 06.57.01','dd/mm/yyyy hh24:mi:ss'),             3030.59 from dual union all
select to_date('03-NOV-13 06.58.01','dd/mm/yyyy hh24:mi:ss'),             3032.33 from dual union all
select to_date('03-NOV-13 06.59.01','dd/mm/yyyy hh24:mi:ss'),             3047.41 from dual union all
select to_date('03-NOV-13 07.00.02','dd/mm/yyyy hh24:mi:ss'),             3045.82 from dual union all
select to_date('03-NOV-13 07.01.01','dd/mm/yyyy hh24:mi:ss'),             3046.63 from dual union all
select to_date('03-NOV-13 07.02.01','dd/mm/yyyy hh24:mi:ss'),             3020.29 from dual union all
select to_date('03-NOV-13 07.03.02','dd/mm/yyyy hh24:mi:ss'),             3019.38 from dual union all
select to_date('03-NOV-13 07.04.01','dd/mm/yyyy hh24:mi:ss'),             3020.76 from dual union all
select to_date('03-NOV-13 07.05.01','dd/mm/yyyy hh24:mi:ss'),             3008.53 from dual )
, x as ( select value, ROUND(value,-1) round_value, change_date, ROW_NUMBER() OVER (ORDER BY change_date) as rn from t order by change_date) select x.value, x.change_date from x join x y on x.rn = y.rn+1 and x.round_value <> y.round_value; 

which gives

3047.41 03-NOV-13
3020.29 03-NOV-13
3008.53 03-NOV-13

which is not too far off the mark but the comparison is always made just to previous value not the first not-suppressed value. Obviously this simply does a rounding and does not look for any percentage change.

I also tried to play with lag like this

with t as (
select to_date('03-NOV-13 06.56.01','dd/mm/yyyy hh24:mi:ss') change_date, 3027.97 value from dual union all
select to_date('03-NOV-13 06.57.01','dd/mm/yyyy hh24:mi:ss'),             3030.59 from dual union all
select to_date('03-NOV-13 06.58.01','dd/mm/yyyy hh24:mi:ss'),             3032.33 from dual union all
select to_date('03-NOV-13 06.59.01','dd/mm/yyyy hh24:mi:ss'),             3047.41 from dual union all
select to_date('03-NOV-13 07.00.02','dd/mm/yyyy hh24:mi:ss'),             3045.82 from dual union all
select to_date('03-NOV-13 07.01.01','dd/mm/yyyy hh24:mi:ss'),             3046.63 from dual union all
select to_date('03-NOV-13 07.02.01','dd/mm/yyyy hh24:mi:ss'),             3020.29 from dual union all
select to_date('03-NOV-13 07.03.02','dd/mm/yyyy hh24:mi:ss'),             3019.38 from dual union all
select to_date('03-NOV-13 07.04.01','dd/mm/yyyy hh24:mi:ss'),             3020.76 from dual union all
select to_date('03-NOV-13 07.05.01','dd/mm/yyyy hh24:mi:ss'),             3008.53 from dual )
select value, change_date, case when abs( lag(value,1,0) over(order by change_date) - value ) / value > 0.001 then value else lag(value,1,0) over(order by change_date) end start_of_group from t;

which results in

 VALUE CHANGE_DA START_OF_GROUP
---------- --------- --------------
3027.97 03-NOV-13        3027.97
3030.59 03-NOV-13        3027.97
3032.33 03-NOV-13        3030.59
3047.41 03-NOV-13        3047.41
3045.82 03-NOV-13        3047.41
3046.63 03-NOV-13        3045.82
3020.29 03-NOV-13        3020.29
3019.38 03-NOV-13        3020.29
3020.76 03-NOV-13        3019.38
3008.53 03-NOV-13        3008.53

which also seems to be a step in the right direction but has the same problem that the comparison is not done to the 'start_of_group' column but the 'value' column

I would appreciate any hints on how to achieve this. Please let me know if the question is sufficiently clear or if I should add any information.

P.S. first time poster, hope I managed to post the question in a meaningful way

Community
  • 1
  • 1
Erik
  • 2,137
  • 3
  • 25
  • 42
  • Hi, and welcome to Stack Overflow! Your question is perfectly fine. Please clarify only one point: what do you mean by a "sufficiently large change"? More than 0.1% above and/or below the average? The previous value? A rolling average of the last N records? A rolling average of the records in the last hour? – Danilo Piazzalunga Nov 08 '13 at 10:30
  • @DaniloPiazzalunga, I tried to clarify below the desired output. – Erik Nov 08 '13 at 10:52

1 Answers1

1

Tricky problem, but I think the following solution works as expect:

with data as (
  select to_date('03-11-13 06.56.01','dd/mm/yyyy hh24:mi:ss') change_date, 3027.97 value from dual union all
  select to_date('03-11-13 06.57.01','dd/mm/yyyy hh24:mi:ss'),             3030.59 from dual union all
  select to_date('03-11-13 06.58.01','dd/mm/yyyy hh24:mi:ss'),             3032.33 from dual union all
  select to_date('03-11-13 06.59.01','dd/mm/yyyy hh24:mi:ss'),             3047.41 from dual union all
  select to_date('03-11-13 07.00.02','dd/mm/yyyy hh24:mi:ss'),             3045.82 from dual union all
  select to_date('03-11-13 07.01.01','dd/mm/yyyy hh24:mi:ss'),             3046.63 from dual union all
  select to_date('03-11-13 07.02.01','dd/mm/yyyy hh24:mi:ss'),             3020.29 from dual union all
  select to_date('03-11-13 07.03.02','dd/mm/yyyy hh24:mi:ss'),             3019.38 from dual union all
  select to_date('03-11-13 07.04.01','dd/mm/yyyy hh24:mi:ss'),             3020.76 from dual union all
  select to_date('03-11-13 07.05.01','dd/mm/yyyy hh24:mi:ss'),             3008.53 from dual )
SELECT
    change_date, value
  FROM data
WHERE change_date IN (
  SELECT
      MIN(change_date)
    FROM (
      SELECT
          t.*,
          (SELECT MAX(change_date)
             FROM data
           WHERE TRUNC(change_date) = TRUNC(t.change_date)
             AND change_date < t.change_date
             AND ABS(value - t.value) / value > 0.001) AS prev
        FROM data t
    )
  GROUP BY prev
)
ORDER BY 1
;

Firstly, for each row we find the maximal change_date that has a value that differs from currently processed row's value by more than 0.1%. Then, we select the minimal dates from that set grouped by the prev dates, and, finally, we select the corresponding values for those dates.

Output:

CHANGE_DATE           VALUE
---------------- ----------
13/11/03 06:56      3027.97 
13/11/03 06:58      3032.33 
13/11/03 06:59      3047.41 
13/11/03 07:02      3020.29 
13/11/03 07:05      3008.53
Przemyslaw Kruglej
  • 8,003
  • 2
  • 26
  • 41
  • Thanks @Przemyslaw, I am accepting your answer since it does exactly what I was looking for. I see that the performance goes down as dataset size goes up but that I guess is simply due to the nested min()/max() operations and cannot really be helped. – Erik Nov 08 '13 at 17:47
  • @user2967091 Unfortunately I was unable to come up with a faster solution. One thing that I could propse is to create two indexes: the first one, a function-based index: `TRUNC(change_date)` - to speed up the innermost subquery, and a second one (normal index) on `change_date`, to speed up the outermost query. – Przemyslaw Kruglej Nov 08 '13 at 17:52