3

I have been using the new percentile_cont in Postgres to calculate percentiles for a table ever since it was launched. However, we are now changing the table to include cardinality for each row, and I'm unsure as to how to implement percentile_cont to take this into account.

Let's say the table looked like this before:

+--------+--------------+
| name   | age          |
+--------+--------------+
|  Joe   | 10           |
+--------+--------------+
|  Bob   | 11           |
+--------+--------------+
|  Lisa  | 12           |
+--------+--------------+

Calculating the 85th percentile for age in the set would simply be done using: percentile_cont(0.85) WITHIN group (ORDER BY age asc) 85

Now, we have a cardinality for each name (the number of people with that specific name). It looks something like this:

+--------------+--------+
| name   | age | count  |
+--------+-----+--------+
|  Joe   | 10  |   2    |
+--------+-----+--------+
|  Bob   | 11  |   1    |
+--------+-----+--------+
|  Lisa  | 12  |   1    |
+--------+-----+--------+

Is there any way to use percentile_cont or any other built in function in Postgres to calculate the percentile taking the count/cardinality into account?

Fredrik
  • 301
  • 3
  • 10

1 Answers1

1

The most obvious solution is to duplicate rows according to count.

The first thing I thought of was a recursive query:

with recursive data (name, age, count) as (
    select * 
    from a_table
union all
    select name, age, count- 1
    from data
    where count > 1
    )
select name, age 
from data
order by name, age;

In Postgres, you can use generate_series() to get the same in an easier and faster way.

select name, age
from a_table
cross join generate_series(1, count)
order by name, age;

Test it in db<>fiddle.

klin
  • 112,967
  • 15
  • 204
  • 232