Using GROUP BY for an non-uniqueness-warning column

Question

I have a query (a view actually) the normally should define a unique mapping from key to value2.

select key, value1, value2
from tbl -- joins are left out here
where (1=1) -- left out here
order by key
;

and usually the result is something like

key       value1  value2
--------------------------
Harry     fish    blue
Sally     dog     green
Willy     dog     red

so I can map key to value2 uniquely.

But as it happens the underlying data may contain junk some day:

key       value1  value2
--------------------------
Harry     fish    blue
Harry     fish    black     -- <<< breaks uniqueness
Sally     dog     green
Willy     dog     red

I would like to add HAVING and/or GROUP BY expressions to identify the nun-unique rows easily. Because I have to do something with value2 in this case I have to aggregate it somehow -- my idea is max.

Therefore an example result would be

key       value1  value2   count_value2
----------------------------------------
Harry     fish    blue     2
Sally     dog     green    1
Willy     dog     red      1

But as always, I am completely baffled by GROUP BY and its consorts. Where do I but it? Where and where the aggregations?

how do we know that Harry's black fish is junk while Harry's blue fish is ok ? Which database and version is used ? — t-clausen.dk, Jun 21 '12 at 11:16

Stefan Steinegger · Accepted Answer · 2012-06-21T11:46:06.737

1

something like this?

select 
  key, 
  min(value1) as "a value1", 
  min(value2) as "a value2", 
  max(value2) as "another value2", 
  count(distinct value2) "number of value2"
FROM tbl
group by key
having count(distinct value2) > 1

edited Jun 21 '12 at 11:46

answered Jun 21 '12 at 10:23

Stefan Steinegger

63,782
15
129
193

Right! I had to modify it a bit, because I forgot to mention a `distinct` complication, but with your hints I managed it. Basically I now have `select key, v1, min(v2), max(v2), count(*) from ( select distinct key, v1, v2 from tbl ) group by key, v1;`. The having filter I do not want to apply here, because I want to *see* a bad line. – towi Jun 21 '12 at 11:48

Nahuel Fouilleul · Answer 2 · 2012-06-21T10:30:21.160

1

Closer to solution:

select tbl.*,count(distinct value2) over (partition by key) from tbl

Or with a group by

select key,count(distinct value2) from tbl group by key having count(distinct value2)>1

edited Jun 21 '12 at 10:30

answered Jun 21 '12 at 10:23

Nahuel Fouilleul

18,726
2
31
36

I agree mostly with this answer. – Oleg Danu Jun 21 '12 at 11:31
whoo, I can't tell. I went for `group by`, see the other answer, because it was closer to what I had in mind. but you are probably right and I believe you. – towi Jun 21 '12 at 11:45

score 0 · Answer 3 · answered Jun 21 '12 at 11:50

0

Please try this and let me know this is what you are looking for

select C.key,C.value1,C.value2,A.count_value2 from
(select key,value1,COUNT(*) count_value2
from tbl
group by key,value1)A
join
(select B.key,B.value1,B.value2 from
(select *,ROW_NUMBER() over(partition by T.key,T.value1 order by T.key )  rn 
from tbl T 
group by T.key,T.value1,T.value2)B
where B.rn=1)C
on C.key=A.key
and C.value1=A.value1

answered Jun 21 '12 at 11:50

Joe G Joseph

23,518
5
56
58

*gulp* partitions... joins... subselects... I think I got a simpler solution, see other answer. but since you are the second person mentioning `partition` to me I think I will look into that. – towi Jun 21 '12 at 11:56

Using GROUP BY for an non-uniqueness-warning column

3 Answers3