-2

I am browsing for an equivalent SQL implementation of HYPGEOM.DIST in Microsoft Excel. I have explored the option of seeding a factorial table with distributions from 1 to N (1...N), but am seeking additional options.

HYPGEOM.DIST Equation

Given:

x = sample_s
n = number_sample
M = population_s
N = number_pop

Reference Excel functionality: https://support.office.com/en-us/article/hypgeom-dist-function-6dbd547f-1d12-4b1f-8ae5-b0d9e3d22fbf

Rob
  • 53
  • 1
  • 9
  • Please provide sample data and desired results -- as *text tables* in the question. You should also explain what "hypergeometric distribution" means. – Gordon Linoff Apr 22 '20 at 13:37
  • Here is tidbit which could help you https://stackoverflow.com/questions/60709096/numerically-calculate-combinations-of-factorials-and-polynomials/60710370#60710370 – Severin Pappadeux Apr 23 '20 at 14:18

1 Answers1

1

You can precalculate factorials for any range you need using whatever technique you are most comfortable with, and store them in a separate table.

The rest, I presume, should be trivial.

EDIT: On second thought, I'm not sure how you plan to go above the limits you have already encountered. 170! ~= 7.25741562E+307, and the maximum value that can be stored as float is 1.79E+308, according to the documentation. It seems SQL Server is not really suitable for your task; you would have to look elsewhere for systems that handle (and store) very large numbers.

It might be possible to resort to calculation in logarithms rather than actual values - this is especially tempting, considering that the original form of Stirling's formula is actually logarithmic. However that would require rewriting all these formulae into their logarithmic forms. This involves a fair bit of math, and my personal knowledge is insufficient even to tell whether it's actually achievable. But it doesn't mean it should stop you from trying :)

Roger Wolf
  • 7,307
  • 2
  • 24
  • 33
  • Thanks Roger! Great suggestion; this was actually my hunch as well, but my concern would be seeding all that data would be a pain (each deployment in our case) and at what point do you stop (1 million - what if the user wants to use a population size of 2 million?)? – Rob Apr 22 '20 at 19:35
  • 1
    Yes, Roger, using logarithm is quite common way to calculate such expressions, there is even special function for that (log of Euler Gamma function), see https://stackoverflow.com/questions/60709096/numerically-calculate-combinations-of-factorials-and-polynomials/60710370#60710370 for details – Severin Pappadeux Apr 23 '20 at 14:20
  • @RogerWolf ends up that we are calculating outside of SQL! Thanks for the help. Your answer is the most applicable. – Rob Apr 29 '20 at 19:02