1

I am using following function to get the percentiles from two columns "Apple" and "Oranges". However, i am getting the result back as a list.

df.approxQuantile(['Apple', 'Oranges'],[0.1, 0.25, 0.5, 0.75, 0.9, 0.95],0.1)

I want to get the result back as columns. Any suggestions :

Desired Output :

+-------+--------------------+---------------------+
|Percentile |               Apple|      Oranges    |
+-------+--------------------+---------------------+
|  10      |              50     |              502|
|  25      |              12     |              431|
|  50      |              1.15   |             5065|
|  75      |              3224   |             1275|
|  90      |              2234   |              100|
+-------+--------------------+---------------------+
Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
Sun
  • 1,855
  • 5
  • 21
  • 26
  • 1
    Can you provide a [mcve] with some sample input data? Read more on [how to create good reproducible apache spark dataframe examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples). – pault May 11 '18 at 13:49

1 Answers1

3

Since API is designed in a specific way, there is not much you can do here, beyond converting the result:

percentiles = [0.1, 0.25, 0.5, 0.75, 0.9, 0.95]
columns = ["Apple", "Oranges"]

spark.createDataFrame(
    zip(percentiles, *df.approxQuantile(columns, percentiles, 0.1)), 
    ["Pecentile"] + columns
)
Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115