1

How do I go about replacing the following self join using analytics:

SELECT 
t1.col1 col1,
t1.col2 col2,
SUM((extract(hour FROM (t1.times_stamp - t2.times_stamp)) * 3600 + extract(minute FROM ( t1.times_stamp - t2.times_stamp)) * 60 + extract(second FROM ( t1.times_stamp - t2.times_stamp)) ) ) div,
COUNT(*) tot_count
FROM tab1 t1,
tab1 t2
WHERE t2.col1      = t1.col1
AND t2.col2  = t1.col2
AND t2.col3        = t1.sequence_num
AND t2.times_stamp     < t1.times_stamp
AND t2.col4         = 3
AND t1.col4         = 4
AND t2.col5 NOT IN(103,123)
AND t1.col5     != 549
GROUP BY t1.col1, t1.col2
tshepang
  • 12,111
  • 21
  • 91
  • 136
edwards
  • 33
  • 4

2 Answers2

1

I'm pretty sure you won't be able to replace the self-join with analytics because you are using inter-rows operations (t1.time_stamp - t2.time_stamp). Analytics can only access the values of the current row and the value of aggregate functions over a subset of rows (windowing clause).

See this article from Tom Kyte and this paper for further analysis of the limitations of analytics.

Vincent Malgrat
  • 66,725
  • 9
  • 119
  • 171
0

It almost looks like you could eliminate the self join on t2 and replace

t1.time_stamp - t2.time_stamp

with something like

t1.time_stamp - lag(t1.time_stamp) over (partition by col1, col2 order by time_stamp)

The different filters on t1 and t2 on col4 and col5 are what prevents you from doing this.
Analytic functions are applied after the where / group by on the main query, so you'd need to have a single filter on t1 in order to use lag/lead to specify following or preceding rows in a sequence.

Also, you'd need to push the sum/group by to an outer query to aggregate after the analytic function:

select col1, col2, sum(timestamp_diff) from (
  select col1, col2, timestamp - lag(timestamp) over(.....) as timestamp_diff
  where ....
) group by col1, col2
wrschneider
  • 17,913
  • 16
  • 96
  • 176