2

I have an Oracle table with event alarms, and due to a weird and unknown -to me- condition alarms get repeated sometimes so I was asked to create a service that will remove repeated alarms from that Oracle table.

The condition to consider an alarm (a row in the table) as repeated is that there is another one with the same exact PKN_EVENTNAME and RECEIVEDDATE only differs from the previous one by a tiny amount of time (let's say 10 seconds for example -up or down-).

What I'm trying to do first is to create an Oracle sql statement that will group all alarms by PKN_EVENTNAME separating in each group the ones that are repeated (for later deletion).

I think I'm on the way, but I'm stuck. What can I try next?

My SQL so far:

select t1.ID, t1.PKN_EVENTNAME, t1.RECEIVEDDATE 
from PARQUIMETERS_ALARMS t1 
where 
  exists
     (select 'x' 
      from   PARQUIMETERS_ALARMS t2 
      where  t1.id <> t2.id and                                              -- Not the same row
             trunc(t2.RECEIVEDDATE) = trunc(t1.RECEIVEDDATE)                 -- Same date
             and abs(t1.RECEIVEDDATE - t2.RECEIVEDDATE) * 24 * 60 * 60 < 10)  -- < 10 sec

enter image description here

Edit

With @Tejash corrections I see different results in Visual Studio Oracle SQL browser, but I'm not being able to understand them. I don't see clear if results are already records to be deleted (repeated alarms) or what.

halfer
  • 19,824
  • 17
  • 99
  • 186
Diego Perez
  • 2,188
  • 2
  • 30
  • 58

3 Answers3

0

You are missing t1.PKN_EVENTNAME = t2.PKN_EVENTNAME in your exists condition and there is one irrelevant condition in your exists clause.

Your query should look like this:

select t1.ID, t1.PKN_EVENTNAME, t1.RECEIVEDDATE 
from PARQUIMETERS_ALARMS t1 
where 
  exists
     (select 'x' 
      from   PARQUIMETERS_ALARMS t2 
      where  t1.id <> t2.id   -- Not the same row                                         
             --trunc(t2.RECEIVEDDATE) = trunc(t1.RECEIVEDDATE)   -- this is not needed
             and t1.PKN_EVENTNAME = t2.PKN_EVENTNAME -- added this
             and abs(t1.RECEIVEDDATE - t2.RECEIVEDDATE) * 24 * 60 * 60 < 10) -- < 5 sec
Popeye
  • 35,427
  • 4
  • 10
  • 31
  • Thank you for your quick response and help @Tejash. With your corrections I see different results in Visual Studio Oracle SQL browser, but I'm not being able to understand them. I don't see clear if results are already records to be deleted (repeated alarms) or what. – Diego Perez Jul 07 '20 at 07:57
  • Result is repeated records. You can use `t1.id > t2.id` instead of `t1.id <> t2.id` in exists clause which will give you records to be deleted (It will select all the duplicates except one) – Popeye Jul 07 '20 at 08:22
  • Thank you so much @Tejash, you are very kind. I'll let you know as soon as I can test it deeply, and then I'll most likely mark your answer as correct (depending on others). – Diego Perez Jul 07 '20 at 09:20
  • 1
    I have to say sorry @Tejash because I misunderstood our customer's statement of the problem I have to solve a little bit, so not to mix things I'll post a new question. Hope you can help me with that too :) – Diego Perez Jul 10 '20 at 12:21
0

You can certainly write this using an exists. It is likely to be more efficient, however, to use an analytic function. Something like this

with alarms as (
  select pa.*,
         lag(pa.RECEIVEDDATE) over (partition by pa.pkn_eventName
                                        order by pa.recievedDate) prior_receivedDate
    from PARQUIMETERS_ALARMS pa
)
select *
  from alarms
 where receivedDate - prior_receivedDate <= interval '10' second;

Note that I'm factoring out the alarms subquery here so you can easily run that separately and see the data set with the additional prior_receivedDate data in it before you apply the filter condition looking for repeated rows. That can often be useful in debugging/ visualizing the data. But you're free to write the query with an inline view if that's easier for you.

Justin Cave
  • 227,342
  • 24
  • 367
  • 384
  • Thank you very much @Justin Cave, you are very kind and helpful. I'll test your approach as soon as possible. – Diego Perez Jul 07 '20 at 09:22
  • I have to say sorry @Justin Cave because I misunderstood our customer's statement of the problem I have to solve a little bit, so not to mix things I'll post a new question. Hope you can help me with that too :) – Diego Perez Jul 10 '20 at 12:20
0

You can utilize range clause of analytic functions:

with dups as (
  select t1.*
       , row_number() over (
           partition by PKN_EVENTNAME, RECEIVEDDATE
           order by id
         ) as dup
  from PARQUIMETERS_ALARMS t1
), nodups as (
  select * from dups where dup = 1
), t as (
  select nodups.ID, nodups.PKN_EVENTNAME, nodups.RECEIVEDDATE
       , count(*) over (
           partition by nodups.PKN_EVENTNAME
           order by nodups.RECEIVEDDATE
           range between interval '10' second preceding and current row
         ) as cnt
  from nodups
)
select * from t where cnt = 1

(UPDATED: CTEs dups and nodups were added after OP shown in comment there are duplicate tuples (PKN_EVENTNAME, RECEIVEDDATE).)

Explanation: after cleaning data passing through nodups CTE, the where condition filters only rows for which there is only single row in last 10 s (which is obviously the current row).

Tomáš Záluský
  • 10,735
  • 2
  • 36
  • 64
  • Thank you very much @Tomáš Záluský, you are really very kind. I'll test your approach as soon as possible, most likely tomorrow morning and let you know. – Diego Perez Jul 07 '20 at 09:24
  • Hello @Tomáš Záluský. I'm reviewing your query and, if I'm not wrong, the results of it are not repeated alarms, right? If so, this is great and thanks again, but what if I would prefer to list repeated ones so I can just delete them? Is it enough just changing "where cnt = 1" by "where cnt <> 1"? – Diego Perez Jul 08 '20 at 08:36
  • Yes, you would change condition to `<>` as you proposed. Also note the query assumes there are no duplicate `RECEIVEDDATE` values within same `PKN_EVENTNAME`. If there are such instances, query must be modified to keep only one of multiple duplicates. – Tomáš Záluský Jul 08 '20 at 10:51
  • Thanks again @Tomáš Záluský. I have checked the table and to my surprise there are records with same exact RECEIVEDDATE and PKN_EVENTNAME. This is not correct -of course-, we are having a structural database problem, but it's a big databse and I cannot change it right now so giving this, how would I modify the query to take this into consideration too and keep only one alarm from every "repeated" group? – Diego Perez Jul 08 '20 at 12:05
  • @DiegoPerez to eliminate duplicates, data must be filtered before the test for 10s durability is applied. See updated query in my answer. – Tomáš Záluský Jul 09 '20 at 07:47
  • You are very kind @Tomáš Záluský. I have to say sorry because I misunderstood our customer's statement of the problem I have to solve a little bit, so not to mix things I will mark your answer as correct (in fact it is correct according to my statement) and I'll post a new question. Hope you can help me with that too :) – Diego Perez Jul 10 '20 at 12:19