Difference b/w 'HAVING NOT EXISTS' and 'MINUS'

Question

HAVING acts like a where clause and EXISTS checks for the rows that exist for the given row or not. So, when we use HAVING NOT EXISTS it should have the same functionality as MINUS which eliminates the common rows from first table.

But in an example when I substituted HAVING NOT EXISTS for MINUS the result sets are not same. the query is

(
   select empno,ename,job,mgr,hiredate,sal,comm,deptno
 ,
        count(*) as cnt
     from  v
    group by empno,ename,job,mgr,hiredate,sal,comm,deptno
    minus
  ( select empno,ename,job,mgr,hiredate,sal,comm,deptno
 ,
          count(*) as cnt
     from emp
    group by empno,ename,job,mgr,hiredate,sal,comm,deptno)
   )
     union all
   (
   select empno,ename,job,mgr,hiredate,sal,comm,deptno
 ,
            count(*) as cnt
     from emp
   group by empno,ename,job,mgr,hiredate,sal,comm,deptno
   minus
  ( select empno,ename,job,mgr,hiredate,sal,comm,deptno
 ,
            count(*) as cnt
     from v
    group by empno,ename,job,mgr,hiredate,sal,comm,deptno)
  )

V is a view, which is formed like

create or replace view v
as
select * from emp where deptno != 10
union all
select * from emp where ename = 'WARD';

The query is an example to determine whether the tables have same data or not

this is my emp table:

  EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
  ---------- ---------- --------- ---------- --------- ---------- ---------- ----------
  7369 SMITH      CLERK           7902 17-DEC-80        800                    20
  7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
  7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
  7566 JONES      MANAGER         7839 02-APR-81       2975                    20
  7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
  7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
  7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
  7788 SCOTT      ANALYST         7566 09-DEC-82       3000                    20
  7839 KING       PRESIDENT            17-NOV-81       5000                    10
  7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30
  7876 ADAMS      CLERK           7788 12-JAN-83       1100                    20
  7900 JAMES      CLERK           7698 03-DEC-81        950                    30
  7902 FORD       ANALYST         7566 03-DEC-81       3000                    20
  7934 MILLER     CLERK           7782 23-JAN-82       1300                    10

This is the view V

EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
------- ---------- --------- ---------- --------- ---------- ---------- ----------
7369 SMITH      CLERK           7902 17-DEC-80        800                    20
7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
7566 JONES      MANAGER         7839 02-APR-81       2975                    20
7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
7788 SCOTT      ANALYST         7566 09-DEC-82       3000                    20
7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30
7876 ADAMS      CLERK           7788 12-JAN-83       1100                    20
7900 JAMES      CLERK           7698 03-DEC-81        950                    30
7902 FORD       ANALYST         7566 03-DEC-81       3000                    20
7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30

When minus is used the query results in

EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO        CNT
------ ---------- --------- ---------- --------- ---------- ---------- ---------- ----------
7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30          2
7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30          1
7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10          1
7839 KING       PRESIDENT            17-NOV-81       5000                    10          1
7934 MILLER     CLERK           7782 23-JAN-82       1300                    10          1

When HAVING NOT EXISTS is used no rows are resulted from the query

Is there any wrong in my understanding of the terms. Please explain why the result sets are different with a short example.Thanks

Can you post the example so we can see the query you are talking about? They can have identical results if used in the same way but they can also vary a lot if used differently. Don't forget that `MINUS` acts on the entire result set and not just a single column. — Wolph, Dec 24 '13 at 09:45
@user3020454 What results do you expect? In other word what is the task of these queries? Are you aware thar resultset of `V` view gives duplicate records? If not, consider using `UNION` instead of `UNION ALL`. — Yaroslav Shabalin, Dec 24 '13 at 11:15

Alex Poole · Accepted Answer · 2013-12-26T15:07:42.660

This is always going to produce no rows:

select empno,ename,job,mgr,hiredate,sal,comm,deptno,count(*) as cnt
from  v
group by empno,ename,job,mgr,hiredate,sal,comm,deptno
having not exists (
  select empno,ename,job,mgr,hiredate,sal,comm,deptno, count(*) as cnt
  from emp
  group by empno,ename,job,mgr,hiredate,sal,comm,deptno
);

There is no correlation between the main select and the data returned by the query in the having clause. Unless emp is empty there will always be something in that clause, so the having not exists has to be false, so all rows in the main query are discarded.

It's nothing to do with your data really; a simplified version does the same:

select * from dual
where not exists (select * from dual);

It's logically impossible for there to be rows in dual, and no rows in dual, at the same time. (OK, if there are no rows in dual you're in big trouble, so maybe this wasn't a great example).

In your case v and emp are related; if emp is empty then v must be too.

The main select in this part finds 11 rows from v. The select inside the having clause finds 14 rows from emp, but it wouldn't matter if it found 1 or 100 rows, as there is no relation between the two sets of rows. The having clause just acts as a filter; having not exists (<query that finds anything>) filters out everything. If you break it down, exists (select ...) is true because there are rows in emp; not exists (select ...) is just the negation of that, so it's false; so you're effectively saying 'give me the rows where true = false'.

The second part of the union finds 14 rows from emp, but the same 'true = false' filter applies, so all rows are still discarded, even though the inner select only finds 11 rows from v. You are not making any connection between the data in v and emp.

But any subquery that always produces rows would have the same effect; because there is no correlation you could be querying a different table which has nothing to do with emp at all, and as long is that subquery returned one or more rows your having clause would filter out everything you expect to see. This behaves the same, for example:

select empno,ename,job,mgr,hiredate,sal,comm,deptno,count(*) as cnt
from  v
group by empno,ename,job,mgr,hiredate,sal,comm,deptno
having not exists (
  select * from dual
);

For the second part of the union you can get some of the data back by adding the correlation:

select empno,ename,job,mgr,hiredate,sal,comm,deptno,count(*) as cnt
from  v
group by empno,ename,job,mgr,hiredate,sal,comm,deptno
having not exists (
  select empno,ename,job,mgr,hiredate,sal,comm,deptno, count(*) as cnt
  from emp
  where emp.empno = v.empno
  and emp.ename = v.ename
  and emp.job = v.job
  and emp.mgr = v.mgr
  and emp.hiredate = v.hiredate
  and emp.sal = v.sal
  and emp.deptno = v.deptno
  group by empno,ename,job,mgr,hiredate,sal,comm,deptno
);

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO        CNT
---------- ---------- --------- ---------- --------- ---------- ---------- ---------- ----------
      7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10          1 
      7934 MILLER     CLERK           7782 23-JAN-82       1300                    10          1 
      7839 KING       PRESIDENT            17-NOV-81       5000                    10          1

Just using empno might work if it's a primary key or unique, unless you'd really be looking for discrepancies between other fields. But correlating the counts between v and emp is tricky, and possibly not worth the effort. Looking at the count at all seems a bit odd; they only differ for WARD because of the contrived way you've built the view. Using minus here is simpler and possibly quicker.

can you elaborate `no correlation between the main select and the data returned by the query in the having clause`. there are few rows which are common to both the queries. so, the query should return the remaining rows. Please, explain where I'm getting it wrong — ethan, Dec 24 '13 at 11:37
@user3020454 - I've added a bit more explanation, but I'm not sure I'm making it very clear still. The `having` clause doesn't know what's in the main `select`, so it doesn't know whether anything is common or not. They are independent as it's written now. — Alex Poole, Dec 24 '13 at 11:50
Thanks for an elaborate answer. Sorry to add that still I'm not clear. `having` acts as a `where` clause on aggregate functions. so it should filter out the rows from the `main` clause with the query we give in it. so when I use `not exists` it should return me the `main` query rows which doesn't have rows from the `subquery`.This is what I understood. Can you point out where I'm going wrong. @Alex — ethan, Dec 26 '13 at 04:03
@user3020454 - you're assuming that each row in the main `select` is checked to see if a matching row exists in the `having` subquery. That is not how you have written it. Each row is checked to see if *any* row exists in the subquery. The subquery, when run independently, returns rows. So every row in the main `select` is filtered out. To do what you describe, there has to be some link (correlation) between the main `select`'s rows and the subquery. That's what I showed in the answer. But comparing the counts as well adds another layer of complication. — Alex Poole, Dec 26 '13 at 14:19

Difference b/w 'HAVING NOT EXISTS' and 'MINUS'

1 Answers1