0

I have a dataframe with column Salary. I have to find out the median on this column using Spark SQL and SCALA. Spark version 1.6.0 and Scala version is 2.10.5.

I have registered Dataframe as table and fired below query.

import org.apache.spark.mllib.random.RandomRDDs

sqlContext.sql("SELECT percentile_approx(salary, 0.5) FROM employee").show()

The Data frame is created from CSV and has rows(Header + data rows). Data rows are odd in number. While firing above query it is giving me result in decimal values.

Data looks like this(from CSV):

salary; name;    job;    gender

1000;    AA;    private;  M

2000;    BB;    public;   M

Please help me to find the correct solution for this. Thanks in advance.

tourist
  • 4,165
  • 6
  • 25
  • 47
codelover
  • 15
  • 6

0 Answers0