I'm trying to use the percentile function in spark-SQL.
Data:
col1
----
198
15.8
198
198
198
198
198
198
198
198
198
If I use the code below the value I get of percentile is incorrect.
select percentile('col1', .05) from tblname
output: 106.9
If I use the code below the value I get of percentile is incorrect.
select percentile('col1', .05, 2) from tblname
output: 24.91000000000001
But if I use the below code I get the expected reply (but I don't know why and how)
select percentile('col1', .05, 100) from tblname
Output: 15.8
Can anyone help me understand how the last argument changes things? Any documentation? I checking out spark source code docstring (as I'm not aware of scala) but no luck. Nothing on the official website either.
percentile(col, percentage [, frequency]) - Returns the exact percentile value > of numeric column col at the given percentage. The value of percentage must be > between 0.0 and 1.0. The value of frequency should be positive integral