1

I have data in a MySQL database that looks something like this:

name |score
----------
alice|60
mary |55
...

A name can appear many times in the list, but can also appear just once. What I would like is to order the list based on the lower bound of a 95% confidence interval for the name. I tried the following:

SELECT name, count(*) as count_n, stddev_samp(score) as stdv, avg(score) as mean 
FROM `my.table`
GROUP BY name
ORDER BY avg(score)-1.96*std(score)/sqrt(count(*)) desc

This produces an output that is ok. Ideally though, I would like to vary the value 1.96, since this should depend on the value of count_n for that name. In fact, it should be a value based on the t-distribution for count_n-1 degrees of freedom. Are there MySQL functions that can do this for me?

I have seen the following answer which looks good but doesn't vary the value as I wold like.

Community
  • 1
  • 1
Geoff
  • 925
  • 4
  • 14
  • 36

1 Answers1

0

I solved my problem by creating a sepearate table 'tdistribution' with the following structure:

dof | tvalue
------------
1   | -12.706
2   | -4.3026

It contains the degree of freedom and the asscociated t value. Then this table can be joined with the original styled query.

SELECT table2.name, 
round(table2.mean-abs(tdistribution.tvalue*table2.stdv/sqrt(table2.nn)),2) AS LCB,
round(table2.mean+abs(tdistribution.tvalue*table2.stdv/sqrt(table2.nn)),2) AS UCB
FROM
    (SELECT table1.name, count(table1.name) AS nn, avg(table1.score) AS mean, stddev_samp(table1.score) AS stdv
    FROM
        (SELECT name, score FROM my.table) AS table1
    GROUP BY name
    ) AS table2
LEFT JOIN tdistribution
ON table2.nn-1=tdistribution.dof
WHERE nn>1
ORDER BY LCB DESC

It seems to work!

Geoff
  • 925
  • 4
  • 14
  • 36