PySpark dataframe approxQuantile returns result as List

Question

I am using following function to get the percentiles from two columns "Apple" and "Oranges". However, i am getting the result back as a list.

df.approxQuantile(['Apple', 'Oranges'],[0.1, 0.25, 0.5, 0.75, 0.9, 0.95],0.1)

I want to get the result back as columns. Any suggestions :

Desired Output :

+-------+--------------------+---------------------+
|Percentile |               Apple|      Oranges    |
+-------+--------------------+---------------------+
|  10      |              50     |              502|
|  25      |              12     |              431|
|  50      |              1.15   |             5065|
|  75      |              3224   |             1275|
|  90      |              2234   |              100|
+-------+--------------------+---------------------+

Can you provide a [mcve] with some sample input data? Read more on [how to create good reproducible apache spark dataframe examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples). — pault, May 11 '18 at 13:49

score 3 · Accepted Answer · answered May 11 '18 at 19:16

Since API is designed in a specific way, there is not much you can do here, beyond converting the result:

percentiles = [0.1, 0.25, 0.5, 0.75, 0.9, 0.95]
columns = ["Apple", "Oranges"]

spark.createDataFrame(
    zip(percentiles, *df.approxQuantile(columns, percentiles, 0.1)), 
    ["Pecentile"] + columns
)

PySpark dataframe approxQuantile returns result as List

1 Answers1