I have a query that is more complex than the example here, but which needs to only return the rows where a certain field doesn't appear more than once in the data set.
ACTIVITY_SK STUDY_ACTIVITY_SK
100 200
101 201
102 200
100 203
In this example I don't want any records with an ACTIVITY_SK
of 100 being returned because ACTIVITY_SK
appears twice in the data set.
The data is a mapping table, and is used in many joins, but multiple records like this imply data quality issues and so I need to simply remove them from the results, rather than cause a bad join elsewhere.
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
I had tried something like this:
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
WHERE A.ACTIVITY_SK NOT IN
(
SELECT
A.ACTIVITY_SK,
COUNT(*)
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
GROUP BY A.ACTIVITY_SK
HAVING COUNT(*) > 1
)
But there must be a less expensive way of doing this...