Count distinct values with OVER(PARTITION BY id)

Question

Is it possible to count distinct values in conjunction with window functions like OVER(PARTITION BY id)? Currently my query is as follows:

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
   congestion.id_element,
ROW_NUMBER() OVER(
    PARTITION BY congestion.id_element
    ORDER BY congestion.date),
COUNT(DISTINCT congestion.week_nb) OVER(
    PARTITION BY congestion.id_element
) AS week_count
FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date

However, when I try to execute the query I get the following error:

"COUNT(DISTINCT": "DISTINCT is not implemented for window functions"

score 8 · Answer 1 · answered Feb 21 '15 at 21:39

No, as the error message states, DISTINCT is not implemented with windows functions. Aplying info from this link into your case you could use something like:

WITH uniques AS (
 SELECT congestion.id_element, COUNT(DISTINCT congestion.week_nb) AS unique_references
 FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
 GROUP BY congestion.id_element
)

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
   congestion.id_element,
ROW_NUMBER() OVER(
    PARTITION BY congestion.id_element
    ORDER BY congestion.date),
uniques.unique_references AS week_count
FROM congestion
JOIN uniques USING (id_element)
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date

Depending on the situation you could also put a subquery straight into SELECT-list:

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
   congestion.id_element,
ROW_NUMBER() OVER(
    PARTITION BY congestion.id_element
    ORDER BY congestion.date),
(SELECT COUNT(DISTINCT dist_con.week_nb)
    FROM congestion AS dist_con
    WHERE dist_con.date >= '2014.01.01'
    AND dist_con.date <= '2014.12.31'
    AND dist_con.id_element = congestion.id_element) AS week_count
FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date

score 2 · Answer 2 · edited Nov 15 '22 at 20:30

I find that the easiest way is to use a subquery/CTE and conditional aggregation:

SELECT
  c.date,
  c.week_nb,
  c.id_congestion,
  c.id_element,
  ROW_NUMBER() OVER (PARTITION BY c.id_element ORDER BY c.date),
  (
    CASE WHEN seqnum = 1 THEN
      1
    ELSE
      0
    END) AS week_count
FROM (
  SELECT
    c.*,
    ROW_NUMBER() OVER (PARTITION BY c.congestion.id_element, c.week_nb ORDER BY c.date) AS seqnum
  FROM
    congestion c) c
WHERE
  c.date >= '2014.01.01'
  AND c.date <= '2014.12.31'
ORDER BY
  id_element,
  date

score 2 · Answer 3 · answered Oct 20 '20 at 21:43

Since this is the first result that pops up from Google, I'll add this reproducible example, similar to Gordon's answer:

Let's first start with creating a sample table:

WITH test as 
(
SELECT * 
FROM (VALUES
(1, 'A'),
(1, 'A'),
(2, 'B'),
(2, 'B'),
(2, 'D'),
(3, 'C'),
(3, 'C'),
(3, 'C'),
(3, 'E'),
(3, 'F')) AS t (id_element, week_nb)
)

select * from test

This yields:

id_element week_nb
1   A
1   A
2   B
2   B
2   D
3   C
3   C
3   C
3   E
3   F

Then, doing something like:

select 
  id_element,
  week_nb,
  sum(first_row_in_sequence) over (partition by id_element) as distinct_week_nb_count
from 
(
select 
  id_element,
  week_nb,
  case when row_number() over (partition by id_element, week_nb) = 1 then 1 else 0 end as first_row_in_sequence
from test
) as sub

yields

id_element week_nb distinct_week_nb_count
1   A   1
1   A   1
2   B   2
2   B   2
2   D   2
3   C   3
3   C   3
3   C   3
3   E   3
3   F   3

Appreciate the explanation, this worked well for me – Garrett Simpson Jun 24 '21 at 08:20 — Garrett Simpson, Jun 24 '21 at 08:20

score 1 · Answer 4 · answered Oct 30 '19 at 09:02

Make partitioned set smaller, up to the point there is no duplicates over counted field :

SELECT congestion.date, congestion.week_nb, congestion.id_congestion,
   congestion.id_element,
ROW_NUMBER() OVER(
    PARTITION BY congestion.id_element
    ORDER BY congestion.date),
COUNT(congestion.week_nb) -- remove distinct 
OVER(
    PARTITION BY congestion.id_element,
                 -- add new fields which will restart counter in case duplication
                 congestion.id_congestion
) AS week_count
FROM congestion
WHERE congestion.date >= '2014.01.01'
AND congestion.date <= '2014.12.31'
ORDER BY id_element, date

I'm not sure if this answer always applies generally, but after some thought it was perfect for my use case. — Silveri, May 12 '20 at 15:42

NiFiNiTe · Answer 5 · 2023-03-04T17:40:34.283

If you are counting distinct numbers, you can use other aggregate functions to acheive the same effect, like so.

select
    initial.id,
    initial.val,
    joined.id,
    array_length(uniq(sort(array_agg(joined.some_number) over (partition by initial.id))), 1) as distinct_count
from
    (values (1,'a'), (2,'b'), (3,'c')) initial(id, val)
        left join (values (1, 1),
                          (1, 1),
                          (1, 3),
                          (2, 2),
                          (2, 2),
                          (3, 3),
                          (3, 3),
                          (3, 3),
                          (3, 4)) joined(id, some_number) on joined.id = initial.id
;

id  val id  distinct_count
1   a   1   2
1   a   1   2
1   a   1   2
2   b   2   1
2   b   2   1
3   c   3   2
3   c   3   2
3   c   3   2
3   c   3   2

If you get an error message saying uniq doesn't exist, it's in the intarray extension. Try running CREATE EXTENSION intarray first and then try again. If you are still having trouble, search the Postgresql docs for info on the intarray extension. "[The] module is considered 'trusted', that is, it can be installed by non-superusers who have CREATE privilege on the current database."

Thank you for posting this late answer. This seems to be the straight-forward way to circumvent PostgreSQL's shortcoming right now. It is weird enough, that they still don't support `DISTINCT` in window functions. Great that you thought of this this workaround. — Thorsten Kettner, Feb 16 '22 at 06:19
I don't see any function called `uniq` in Postgres? Are you sure this is a thing? — Andrew, Mar 02 '23 at 01:02
@Andrew, see the updated answer. You may need to create the extension by running `CREATE EXTENSION intarray` first before you can use it. — NiFiNiTe, Mar 04 '23 at 17:41

score 0 · Answer 6 · answered Jul 18 '23 at 03:36

Yes, "DISTINCT is not implemented for window functions".

Another solution is to do DENSE_RANK() OVER(PARTITION BY ORDER BY) and MAX() OVER(PARTITION BY) instead.

e.g: for the purpose COUNT(DISTINCT week_nb) OVER(PARTITION BY id_element) week_count:

SELECT  s.date, 
        s.week_nb, 
        s.id_congestion, 
        s.id_element,
        MAX(s.week_dense_rank) OVER(PARTITION BY s.id_element) week_count
FROM
( 
   SELECT date,
          week_nb,
          id_congestion,
          id_element,
          DENSE_RANK() OVER(PARTITION BY id_element ORDER BY week_nb) week_dense_rank
   FROM   congestion
   WHERE  date >= '2014.01.01'
   AND    date <= '2014.12.31' 
) s

Count distinct values with OVER(PARTITION BY id)

6 Answers6