-1

I am looking for a Scala implementation of Python's sklearn.preprocessing.QuantileTransformer class. There doesn't seem to be a single Class that can implement the entire functionality in scala.

The Python implementation has 3 major parts:

1) Compute quantiles for given data and percentile array using numpy.percentile(). If quantile lies between two input data points, then linear interpolation is used. The closest I can find in Scala is in breeze, which has percentile() function (Observation: The DataFrame.stats.approxQuantile() does not perform the linear interpolation and thus can't be used here).

2) Uses numpy.interp() to convert the input range of values to a given range. Eg If input data range is 1-100, it can be converted to any given range say 0-1. Again this uses linear interpolation when input data is present between 2 quantiles. The closest I can find in Scala is breeze.interpolation class.

3)Calculate the inverse CDF using numpy.ppf(). I believe, for this I can use the NormalDistribution class as one answer below or StandardScaler class.

Anything better to make the coding short and simple?

John Subas
  • 81
  • 1
  • 11

1 Answers1

1

The Apache Commons Math library has a NormalDistribution class, which has an inverseCumulativeProbability method that calculates the specified quantile value. That should suit your purposes.

Mike Allen
  • 8,139
  • 2
  • 24
  • 46