0

I would like to build a stored function to compute quantils within groups. I have a very nice SQL from the net to compute median: http://geekyisawesome.blogspot.ch/2012/04/finding-median-value-using-mysql-sql.html. This SQL allows to compute the median of one variable of the table. I would like to compute median by grouping. To do this first I have to prepare a table with increasing counter within groups. Even this for me was a challenge so I got help from stackoverflow: MySQL - Counter within group. Putting all together I'm able to compute median within groups:

-- (01) Create table fake table:

CREATE TABLE test01.tb (
g CHAR(1)
, x  INTEGER
);

INSERT INTO test01.tb (g, x)
VALUES
('a',10)
, ('a',2)
, ('a',0)
, ('b',10)
, ('b',10)
, ('b',10)
, ('b',1)
, ('b',1)
, ('b',1)
;

-- (02) Create enumerated table

CREATE TABLE tb2 AS       
SELECT a.g, a.x
       , @counter := if (g = @prev_g, @counter + 1, 1) counter
       , @prev_g := g
       , (SELECT COUNT(*)
         FROM tb b
         WHERE a.g = b.g
         )  AS max_counter
FROM tb a, (SELECT @counter := 0, @prev_g := NULL) INIT
ORDER BY a.g, a.x
;

-- (03) Compute median within group

SELECT g, AVG(x)
FROM tb2
WHERE counter IN (FLOOR((max_counter+1)*0.5),CEIL((max_counter+1)*0.5))
GROUP BY g
;

My idea is to build a function QUANTILE which allows to compute quantils like this:

SELECT g, QUANTILE(x,0.5) median
FROM tb
GROUP BY g
;

I never built stored functions/procedures and looking in the net the tables in the stored functions are hard coded. So, I'm wonder if it is possible to build such a generic function. Thanks for advices.

Community
  • 1
  • 1
giordano
  • 2,954
  • 7
  • 35
  • 57
  • *Even this for me was a challenge*. Then most likely you should let it go. Theoretically you can create your own aggregate function for MySQL but you should do that in C or C++ http://dev.mysql.com/doc/refman/5.5/en/adding-udf.html to be able to call it as nice and easy as you stated in desired outcome in your question – peterm Jan 10 '14 at 05:22
  • Thanks for the hint. Now I have 3 ways to compute the median: 1) The easiest for me would be to compute it in R and import the median back to MySQL-DB. I try to avoid it due to problems of performance. 2) Stored function: the second easiest but I have first to get into writing stored functions. 3) The most difficult is writing my own C-function since I can't programming C. The latter is the most elegant but first I will try to solve it with stored function and later a C-function. I will inform as soon as possible. – giordano Jan 12 '14 at 12:50

0 Answers0