How to compute a binomial distribution in postgreSQL

Question

I have a function that computes a binomial distribution naively as follows

CREATE OR REPLACE FUNCTION random_binomial(n int, p float)
RETURNS int AS $$
DECLARE
    -- Loop variable
    i int;
    -- Result of the function
    result float = 0;
BEGIN
    IF n <= 0 OR p <= 0.0 OR p >= 1.0 THEN
        RETURN NULL;
    END IF;
    FOR i IN 1..n LOOP
        IF random() < p THEN
            result = result + 1;
        END IF;
    END LOOP;
    RETURN result;
END;
$$ LANGUAGE plpgsql STRICT;

While doing simulations I need to call the function thousands of times and this makes the process quite slow. Is there a smarter way to compute a binomial distribution in postgreSQL?

score 0 · Answer 1 · answered May 21 '20 at 09:24

0

PostgreSQL allows to write C functions that can easily be integrated in your application (assuming of course you know to code in C): it should be faster than PL/PGSQL.

See detailed documentation:https://www.postgresql.org/docs/current/xfunc-c.html

answered May 21 '20 at 09:24

pifor

7,419
2
8
16

Thanks for your answer ! But this requires to include a library in my application such as gsl https://www.gnu.org/software/gsl/ and I was trying to avoid this additional dependency. Any suggestion? – Esteban Zimanyi May 21 '20 at 10:00
No because actually databases like PostgreSQL are not designed to do "numeric computing" that requires a lot of CPU usage. PostgreSQL allows use to use Java, Perl or Python as procedural extensions but I don't know if this could be really faster than PL/PGSQL. – pifor May 21 '20 at 10:04

score 0 · Answer 2 · answered Nov 14 '22 at 02:39

Try this approach:

Create Bins for with lower and upper bounds
Count from the table's desired column and put each count into the bin
While doing so, make sure to keep all the data of bins (lower and upper bound) when joining (It is ok for the count of some value of the bins to be NULL but not the other way around.

-- Create Bins with upper and lower bound
WITH bins AS (
      SELECT generate_series(2200, 3050, 50) AS lower,
             generate_series(2250, 3100, 50) AS upper),
     -- Take the counts from desired table
     temp_table AS (
      SELECT question_count 
        FROM table_name) 
-- Select columns for result
SELECT lower, upper, count(question_count) 
  FROM bins  -- Created above
       -- Join to temp_table (created above), 
       -- keeping all rows from the bins table in the join
       LEFT JOIN temp_table 
       -- Compare question_count to lower and upper
         ON question_count >= lower 
        AND question_count < upper
 -- Group by lower and upper to count values in each bin
 GROUP BY lower, upper
 -- Order by lower to put bins in order
 ORDER BY lower;

How to compute a binomial distribution in postgreSQL

2 Answers2