22

I am trying to count rows which have a distinct combination of 2 columns in Amazon redshift. The query I am using is -

select count(distinct col1, col2)
from schemaname.tablename
where some filters

It is throwing me this error -

Amazon Invalid operation: function count(character varying, bigint) does not exist`

I tried casting bigint to char but it didn't work.

AswinRajaram
  • 1,519
  • 7
  • 18
Janusz01
  • 507
  • 2
  • 4
  • 12
  • 1
    please share your sample data and expected output – Zaynul Abadin Tuhin Sep 24 '18 at 05:44
  • There are many ways to achieve this. It would be great if could provide you whole use case. So we could try to find the best way for you – S-Man Sep 24 '18 at 07:06
  • 1
    In Postgres you could use `select count (distinct (col1, col2))` (note the parentheses around the two columns)- maybe Redshift allows that as well. –  Sep 24 '18 at 09:30
  • @a_horse_with_no_name, tried it. It doesn't work even with parentheses around the column combination – Skandy Feb 03 '20 at 23:55

5 Answers5

36

you can use sub-query and count

select count(*) from (
  select distinct col1, col2 
 from schemaname.tablename
  where some filter
) as t
Zaynul Abadin Tuhin
  • 31,407
  • 5
  • 33
  • 63
15

A little late to the party but anyway: you can also try to concatenate columns using || operator. It might be inefficient so I wouldn't use it in prod code, but for ad-hoc analysis should be fine.

select count(distinct col1 || '_' || col2)
from schemaname.tablename
where some filters

Note separator choice might matter, i.e. both 'foo' || '_' || 'bar_baz' and 'foo_bar' || '_' || 'baz' yield 'foo_bar_baz' and are thus equal. In some cases this might be concern, in some it's so insignificant you can skip separator completely.

Mariusz Sakowski
  • 3,232
  • 14
  • 21
2

You can use

select col1,col2,count(*) from schemaname.tablename
where -- your filter
group by col1,col2
Zaynul Abadin Tuhin
  • 31,407
  • 5
  • 33
  • 63
Deepak
  • 21
  • 1
  • 1
1

If you are just trying to do count(distinct) then Zaynul's answer is correct. If you want other aggregations as well, here is another method:

select . . ., 
       sum(case when seqnum = 1 then 1 else 0 end) as col1_col2_unique_count
from (select t.*,
             row_number() over (partition by col1, col2 order by col1) as seqnum
      from schemaname.tablename t
      where some filters
     ) c
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

If any of the columns have NULL, then concatenation might be insufficient, and the result might be wrong.

I would suggest a combination of concatenation with replacing NULLs with the empty string, i.e.

SELECT count(DISTINCT NVL(col1, '') || NVL (col2, '') || NVL (col3, ''))
tremendows
  • 4,262
  • 3
  • 34
  • 51