1

Am having some troubles in handling nulls while calculating percentiles. Below is the sample data.

enter image description here

Code that am using now: percentile(column_1, array(0, 0.25, 0.50, 0.75, 1)) as column_1_p

Here it considers null values too while calculating percentiles. But I need to eliminate them and only use other valid values to calculate percentiles. I couldn't find any other function which does this.

Data: Values range from zero to 1000. I cannot replace nulls with zeros, as I already have zeros in data.

Any help here is highly appreciated.

Thanks in advance.

kumar
  • 33
  • 6
  • one option would be to create a temp table with not null values and then use that to calculate the percentile or create Hive UADF. – dassum Nov 13 '19 at 04:31

0 Answers0