0

I try to delete rows with null values at the mail column, if and only if exists a row with a non-null value for the same person_id.

DELETE
FROM
  EMP_MAIL EM
WHERE
  MAIL IS NULL
AND EXISTS
  (
    SELECT 
      EMP_ID
    FROM
      EMP_MAIL EM2
    WHERE
      MAIL IS NOT NULL
    AND EM.EMP_ID    = EM2.EMP_ID
  ) ;

but it takes quite a while. (i have an non-unique index on EMP_MAIL(EMP_ID))

is there a better way to preform this operation ?

Nati
  • 1,034
  • 5
  • 19
  • 46
  • Define "quite a while". How many rows are in the table? How many rows are you going to delete? How big is the table (in MB/ GB/ etc.)? What is the query plan? – Justin Cave Jan 05 '14 at 12:36
  • 3M rows in the table. 0 rows to be deleted. 30 sec. 2 Full table access. – Nati Jan 05 '14 at 12:36
  • Query is good. 30 secs for 3M rows is not bad. But EMP_ID is duplicating in table .. why not make `EMP_ID` as PK and if `MAIL` column must store multiple values, them have a separate table called MAIL having emp_id,mail columns and then you can join them as needed. – Rahul Jan 05 '14 at 12:44
  • can u explain your logic rules more? – Hamidreza Jan 05 '14 at 13:05
  • @Hamidreza - what do you mean ? I do have a different table for employees, but for design reasons, I keep a row for every employee, and in case there's no mail, then I put a null value. – Nati Jan 05 '14 at 13:26
  • As from the table name, it seems to store mail ids for different employees and it also seems illogical if some of the rows have null mail id. Why don't you delete all the rows that have null mail ids? It seems useless to me to keep those records unless there is some strict business logic. – San Jan 06 '14 at 04:08

2 Answers2

0

it depends on your input data, i hope this query can help you or give you some clues to solve the problem:

DELETE
from 
EMP_MAIL t1 
where 
t1.MAIL is null and 
not Exists
(select t2.EMP_ID from (select EMP_ID from EMP_MAIL where mail is not null
minus
select EMP_ID from EMP_MAIL where mail is null)t2 where t1.EMP_ID =t2.EMP_ID);
Hamidreza
  • 3,038
  • 1
  • 18
  • 15
  • I guess this will lead to three full table scans(need to check explain plan), I would vote OP's query over it. – San Jan 06 '14 at 04:10
0

Well, I've tested a bit and now I'm going to present the result.

This is my test script (have a look: no indices at all):

create table emp_mail (
  emp_id        number,
  mail          clob
);

declare
  amount_of_users        constant number := 30000;
  nulls_amount_percent   constant number := 90;
begin
  for i in 1..300 loop
    insert into emp_mail
    select round(dbms_random.value*amount_of_users),case when dbms_random.value < nulls_amount_percent/100 then null else 'some mail' end
    from dual
    connect by level <= 10000;

    commit;
  end loop;
end;
/
delete from emp_mail em
where mail is null
  and exists (select null
              from emp_mail em2
              where mail is not null
                and em.emp_id = em2.emp_id);

drop table emp_mail;

It creates the table (I create with CLOB to have the worst case), then it fills the table with 3M of rows, by the constants

  amount_of_users        constant number := 30000;
  nulls_amount_percent   constant number := 90;

you can play around the rows which are going to be removed, then it performs your "delete" statement and drops the table.

Let's talk about numbers.

Warning: when I talk about seconds I mean only delete statement, because row generator anonymous block takes nearly 2 minutes.

With these values

  amount_of_users        constant number := 30000;
  nulls_amount_percent   constant number := 90;

nearly the whole table is deleted and I can reproduce your mystic 27.5 seconds.

Then I take

  amount_of_users        constant number := 30000;
  nulls_amount_percent   constant number := 1;

to decrease the amount of removed rows and it gives me only 1.8 seconds.

Let's take a case when no rows are deleted:

  amount_of_users        constant number := 30000;
  nulls_amount_percent   constant number := 0;

Guess time? 1.1 second, and as far as you told it is your case.

Well, I can't give you any advice what to do because the thing you are talking is probably a bit more global then one query. Perhaps, you database version is too old, or it is about hardware or anything else.

If it is about this case

  amount_of_users        constant number := 30000;
  nulls_amount_percent   constant number := 90;

when really many rows are removed I can advice to split the deleting into e.g. 10000-rows-parts (and firstly select count(*) to know how many times you should run the restricted deleting) and make a commit after every, sometimes it is much faster then to delete big amounts of rows. But as far as you told that you have 0 rows deleted it shouldn't be your problem.

smnbbrv
  • 23,502
  • 9
  • 78
  • 109