1

I have posted a similar question some days ago, but unfortunately I misunderstood our customer's statement of the problem I have to solve (most precisely I forgot about a part of it). That's why this post may look as duplicated, but it is not.

This is my original post: Oracle sql: filtering repeated rows that only differ by a tiny amount of time

So lets go again:

I have an Oracle table with event alarms fired by parquimeters. Alarms have a state of Open/Close and when an alarm is Open (PKN_EVENTSTATUS='Open') and Close (PKN_EVENTSTATUS='Close') between a tiny amount of time -RECEIVEDDATE- (let's say for example Open RECEIVEDDATE = x | Close RECEIVEDDATE = x + 30 seconds and -of course- PKN_EVENTNAME is the same) this both event rows (Open/Close) are considered as a "fake" alarm (a "failure alarm" that was fired by the parquimeter by "mistake") so both should be deleted.

I would need to create an Oracle SQL query that will select all that "fake" alarms so I can delete them. Again, alarms that has a tiny difference in time (RECEIVEDDATE) between "Open" and "Close" states.

I started creating a query that might work, but it's extremely slow, so I can not even test it because it takes too long. I'm quite sure it can be optimized, but cannot find how right now so I hope anyone can help me.

My current very slow query:

select t1.ID, t1.PKN_EVENTNAME, t1.PKN_EVENTSTATUS, t1.RECEIVEDDATE
from PARQUIMETERS_ALARMS t1
where
  exists
     (select 'x'
      from   PARQUIMETERS_ALARMS t2
      where  t1.PKN_EVENTNAME = t2.PKN_EVENTNAME
             and ((t1.PKN_EVENTSTATUS = 'Open' and t2.PKN_EVENTSTATUS = 'Close'
             and abs(t1.RECEIVEDDATE - t2.RECEIVEDDATE) * 24 * 60 * 60 < 30) -- < 30 sec
             or (t1.PKN_EVENTSTATUS = 'Close' and t2.PKN_EVENTSTATUS = 'Open'
             and abs(t2.RECEIVEDDATE - t1.RECEIVEDDATE) * 24 * 60 * 60 < 30))) -- < 30 sec
GMB
  • 216,147
  • 25
  • 84
  • 135
Diego Perez
  • 2,188
  • 2
  • 30
  • 58
  • how many records do you have in the table ? do you have indexes ? can you provide the explain plan ?? – Roberto Hernandez Jul 10 '20 at 13:33
  • Oracle 12+ would allow you even better variant than pivot - MATCH_RECOGNIZE – Sayan Malakshinov Jul 10 '20 at 13:45
  • The alarms table currently has 2327544 records @Roberto Hernandez. It currently doesn't have an index, but I could propose to the team to create one if necessary. – Diego Perez Jul 10 '20 at 15:19
  • @DiegoPerez , I would like to see the execution plan – Roberto Hernandez Jul 11 '20 at 09:10
  • Hello @Roberto Hernandez, this is going to be run from a .Net C# Web Api. As this Oracle query has to be translated to LinkQ to do so -and this is not a very easy task- we haven't decided if we are going to do it that way yet, or simply return all records (with a little filter) and then process the results in a loop, or with a linq subquery. – Diego Perez Jul 13 '20 at 12:50

2 Answers2

0

It might be faster to use two exists conditions rather than one:

select t1.id, t1.pkn_eventname, t1.pkn_eventstatus, t1.receiveddate
from parquimeters_alarms t1
where
    (
        t1.pkn_eventstatus = 'Open'
        and exists (
            select 1
            from parquimeters_alarms t2
            where 
                t2.pkn_eventname = t1.pkn_eventname 
                and t2.pkn_eventstatus = 'Close'
                and t2.receiveddate < t1.receiveddate + 30 / 60 / 60 / 24
        )
    )
    or (
        t1.pkn_eventstatus = 'Close'
        and exists (
            select 1
            from parquimeters_alarms t2
            where 
                t2.pkn_eventname = t1.pkn_eventname 
                and t2.pkn_eventstatus = 'Open'
                and t1.receiveddate < t2.receiveddate + 30 / 60 / 60 / 24
        )
    )
    

This query might take advantage of an index on (pkn_eventname, pkn_eventstatus, receiveddate).

You could also consider union all, which avoids the need for an or condition:

select t1.id, t1.pkn_eventname, t1.pkn_eventstatus, t1.receiveddate
from parquimeters_alarms t1
where
    t1.pkn_eventstatus = 'Open'
    and exists (
        select 1
        from parquimeters_alarms t2
        where 
            t2.pkn_eventname = t1.pkn_eventname 
            and t2.pkn_eventstatus = 'Close'
            and t2.receiveddate < t1.receiveddate + 30 / 60 / 60 / 24
    )
union all
select t1.id, t1.pkn_eventname, t1.pkn_eventstatus, t1.receiveddate
from parquimeters_alarms t1
where 
    t1.pkn_eventstatus = 'Close'
    and exists (
        select 1
        from parquimeters_alarms t2
        where 
            t2.pkn_eventname = t1.pkn_eventname 
            and t2.pkn_eventstatus = 'Open'
            and t1.receiveddate < t2.receiveddate + 30 / 60 / 60 / 24
    )
GMB
  • 216,147
  • 25
  • 84
  • 135
  • Thank you so much for your help and time @GMB, I appreciate it a lot. My boss (and partner at work) won't be available until monday, so I won't be able to test it depply until that, but I did some quick tests and not only works but it is also fast enough so I think I'll mark it as correct next monday (depending on other possible answers). – Diego Perez Jul 10 '20 at 15:34
0

You can use pivot() to make it easier - it requires just 1 scan without extra request to the same table:

select 
  level, 
  'EVENT'||trunc(dbms_random.value(1,5)), 
  case when dbms_random.value>0.5 then 'CLOSE' else 'OPEN' end case,
  date'2020-01-01' + numtodsinterval( trunc(level*dbms_random.value(1,60)) ,'second')
from dual
connect by level<=50
)
,v_pivot as (
   select 
       ID
     , PKN_EVENTNAME
     , OPEN
     , CLOSE
     , lag(open)over(partition by PKN_EVENTNAME order by coalesce(open,close)) last_open
   from (
      select *
      from PARQUIMETERS_ALARMS v
      pivot (
         max(RECEIVEDDATE)
         for (PKN_EVENTSTATUS) in ('OPEN' as open,'CLOSE' as close)
      )
   )
)
select
 *
from v_pivot
where close is null or
abs(close - last_open) * 24 * 60 * 60 > 30;

Full example on randomly generated data (since it's generated, it includes unclosed and unopened events):

with PARQUIMETERS_ALARMS(ID, PKN_EVENTNAME, PKN_EVENTSTATUS, RECEIVEDDATE) as (
select 
  level, 
  'EVENT'||trunc(dbms_random.value(1,5)), 
  case when dbms_random.value>0.5 then 'CLOSE' else 'OPEN' end case,
  date'2020-01-01' + numtodsinterval( trunc(level*dbms_random.value(1,60)) ,'second')
from dual
connect by level<=50
)
,v_pivot as (
   select 
       ID
     , PKN_EVENTNAME
     , OPEN
     , CLOSE
     , lag(open)over(partition by PKN_EVENTNAME order by coalesce(open,close)) last_open
   from (
      select *
      from PARQUIMETERS_ALARMS v
      pivot (
         max(RECEIVEDDATE)
         for (PKN_EVENTSTATUS) in ('OPEN' as open,'CLOSE' as close)
      )
   )
)
select
 *
from v_pivot
where close is null or
abs(close - last_open) * 24 * 60 * 60 > 30
/
Sayan Malakshinov
  • 8,492
  • 1
  • 19
  • 27