Count distinct multiple columns in redshift

Question

I am trying to count rows which have a distinct combination of 2 columns in Amazon redshift. The query I am using is -

select count(distinct col1, col2)
from schemaname.tablename
where some filters

It is throwing me this error -

Amazon Invalid operation: function count(character varying, bigint) does not exist`

I tried casting bigint to char but it didn't work.

There are many ways to achieve this. It would be great if could provide you whole use case. So we could try to find the best way for you — S-Man, Sep 24 '18 at 07:06
In Postgres you could use `select count (distinct (col1, col2))` (note the parentheses around the two columns)- maybe Redshift allows that as well. — , Sep 24 '18 at 09:30
@a_horse_with_no_name, tried it. It doesn't work even with parentheses around the column combination — Skandy, Feb 03 '20 at 23:55

score 36 · Accepted Answer · answered Sep 24 '18 at 05:45

36

you can use sub-query and count

select count(*) from (
  select distinct col1, col2 
 from schemaname.tablename
  where some filter
) as t

answered Sep 24 '18 at 05:45

Zaynul Abadin Tuhin

31,407
5
33
63

I believe that this is also much better from a performance perspective in RedShift. – Gordon Linoff Sep 24 '18 at 11:20

Mariusz Sakowski · Answer 2 · 2022-09-24T12:43:06.853

A little late to the party but anyway: you can also try to concatenate columns using || operator. It might be inefficient so I wouldn't use it in prod code, but for ad-hoc analysis should be fine.

select count(distinct col1 || '_' || col2)
from schemaname.tablename
where some filters

Note separator choice might matter, i.e. both 'foo' || '_' || 'bar_baz' and 'foo_bar' || '_' || 'baz' yield 'foo_bar_baz' and are thus equal. In some cases this might be concern, in some it's so insignificant you can skip separator completely.

score 2 · Answer 3 · edited Feb 04 '20 at 05:44

2

You can use

select col1,col2,count(*) from schemaname.tablename
where -- your filter
group by col1,col2

edited Feb 04 '20 at 05:44

Zaynul Abadin Tuhin

31,407
5
33
63

answered Sep 24 '18 at 09:12

Deepak

21
1
1

score 1 · Answer 4 · answered Sep 24 '18 at 11:21

If you are just trying to do count(distinct) then Zaynul's answer is correct. If you want other aggregations as well, here is another method:

select . . ., 
       sum(case when seqnum = 1 then 1 else 0 end) as col1_col2_unique_count
from (select t.*,
             row_number() over (partition by col1, col2 order by col1) as seqnum
      from schemaname.tablename t
      where some filters
     ) c

score 0 · Answer 5 · edited Jun 20 '23 at 15:00

0

If any of the columns have NULL, then concatenation might be insufficient, and the result might be wrong.

I would suggest a combination of concatenation with replacing NULLs with the empty string, i.e.

SELECT count(DISTINCT NVL(col1, '') || NVL (col2, '') || NVL (col3, ''))

edited Jun 20 '23 at 15:00

tremendows

4,262
3
34
51

answered Jun 19 '23 at 16:48

Diana Valento

1

Count distinct multiple columns in redshift

5 Answers5