0

I am using Pyspak on DataBricks. I have already gotten the percentile table based on the training sample. Now, I want to use a table to get the percentile of the testing dataset. For Example, I have Column "Val1" and I have created a percentile Table by applying the percentile rank function on "val1" and creating a table at every 0.01 percentile. Something like below

|val1|Percentile|
|----|----------|
|-1000|0|
|-800|0.01|
|-750|0.02|
|-650|0.03|
|....|...|
|1500|0.97|
|1600|0.98|
|1750|0.99|
|2000|1|

Now I want to use this table to get the percentile of "Val1" in the testing dataset. If the value is in between then interpolate between two boundaries. For Example: If one row has 1550 in testing the percentile should be 0.975 because 1500 has a value of 0.97 and 1600 has 0.98 in the above table.

Is it possible to achieve this? Thanks.

ASD
  • 25
  • 6

0 Answers0