I have a dataframe with column Salary
. I have to find out the median on this column using Spark SQL and SCALA.
Spark version 1.6.0 and Scala version is 2.10.5.
I have registered Dataframe as table and fired below query.
import org.apache.spark.mllib.random.RandomRDDs
sqlContext.sql("SELECT percentile_approx(salary, 0.5) FROM employee").show()
The Data frame is created from CSV and has rows(Header + data rows). Data rows are odd in number. While firing above query it is giving me result in decimal values.
Data looks like this(from CSV):
salary; name; job; gender
1000; AA; private; M
2000; BB; public; M
Please help me to find the correct solution for this. Thanks in advance.