My Hive table currently looks like this:
Numbers
0
0
-0.12745098
-0.218905473
0.026011561
0.235294118
-0.028
-0.052356021
0.052753355
0.008032129
0.012768817
0.115384615
0.040816327
The type is DOUBLE_TYPE. I would like to calculate the median. I would expect the answer to be 0.008032129, since this is the 7th observation ordering my numbers.
When I run this code (as suggested here How to calculate median in Hive):
select percentile_approx(Numbers, 0.5) AS Numbers
from tryout1
The answer I get is : 0.0040160642570281121. This is unexpected, and not even one of the numbers in my list! Does anyone know why Hive gives me this number, and what I should fix to make it work? If you know an entirely different way to calculate the median, I am also very interested!