Restricting results to only rows where one value appears only once

Question

I have a query that is more complex than the example here, but which needs to only return the rows where a certain field doesn't appear more than once in the data set.

ACTIVITY_SK      STUDY_ACTIVITY_SK
100              200
101              201
102              200
100              203

In this example I don't want any records with an ACTIVITY_SK of 100 being returned because ACTIVITY_SK appears twice in the data set.

The data is a mapping table, and is used in many joins, but multiple records like this imply data quality issues and so I need to simply remove them from the results, rather than cause a bad join elsewhere.

SELECT 
   A.ACTIVITY_SK,
   A.STATUS,
   B.STUDY_ACTIVITY_SK,
   B.NAME,
   B.PROJECT
 FROM
   ACTIVITY A,
   PROJECT B
 WHERE 
   A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK

I had tried something like this:

SELECT 
   A.ACTIVITY_SK,
   A.STATUS,
   B.STUDY_ACTIVITY_SK,
   B.NAME,
   B.PROJECT
 FROM
   ACTIVITY A,
   PROJECT B
 WHERE 
   A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
 WHERE A.ACTIVITY_SK NOT IN
 (

  SELECT 
     A.ACTIVITY_SK,
     COUNT(*)
    FROM
      ACTIVITY A,
      PROJECT B
    WHERE 
    A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
    GROUP BY A.ACTIVITY_SK
    HAVING COUNT(*) > 1

 )

But there must be a less expensive way of doing this...

score 5 · Accepted Answer · answered Feb 01 '12 at 21:07

5

Something like this could be a bit "cheaper" to run:

SELECT
   A.ACTIVITY_SK,
   A.STATUS,
   B.STUDY_ACTIVITY_SK,
   B.NAME,
   B.PROJECT
PROJECT B INNER JOIN
   (SELECT 
       ACTIVITY_SK,
       MIN(STATUS) STATUS,
    FROM
      ACTIVITY
    GROUP BY ACTIVITY_SK
    HAVING COUNT(ACTIVITY_SK) = 1 ) A
ON A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK

answered Feb 01 '12 at 21:07

Mithrandir

24,869
6
50
66

Hi, thanks...didn't think I could do a count of something I am grouping by? Looks like I can't avoid effectively running the same query twice, the join between a and b needs to take place in both queries as it is this join which creates the duplicates. – user1183688 Feb 01 '12 at 21:18
You can. Generally i don't use COUNT(*), i try always to COUNT(PK_COLUMN), since the PK has never a null value and the optimizer can use an index on that column. – Mithrandir Feb 01 '12 at 21:22

score 1 · Answer 2 · answered Feb 02 '12 at 05:24

Another alternative:

select * from (
  SELECT 
     A.ACTIVITY_SK,
     A.STATUS,
     B.STUDY_ACTIVITY_SK,
     B.NAME,
     B.PROJECT,
     count(distinct a.pk) over (partition by a.activity_sk) AS c
   FROM
     ACTIVITY A,
     PROJECT B
   WHERE 
     A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
) where c = 1;

(where a.pk refers to a unique identifier from the ACTIVITY table)

Restricting results to only rows where one value appears only once

2 Answers2