2

I'm working on a predictive model. My model predictions don't always come out as a standard distribution. I want to transform, or fit the distribution values so that the distribution is fit to a bell curve. It's like I want a sort of transformation function that will transform my distribution into a bell curve (not necessarily normalized).

For example, here is what my distribution looks like:

enter image description here

Notice that the distribution is somewhat skewed and not perfectly standard/shaped like a bell curve.

Here is something close to what I want that distribution to look like:

enter image description here

NOTE: This is not the perfect distribution either, just closer

NOTE: I'm not trying to normalize the values, just fit the distribution. Notice that the goal distribution is not normalized.

I thought I could use something with scipy.norm or numpy but I can't seem to find exactly what I want.

Kendall Weihe
  • 2,021
  • 4
  • 27
  • 53
  • To fit your data to a normal distribution see this [stackoverflow answer](http://stackoverflow.com/questions/20011122/fitting-a-normal-distribution-to-1d-data). – qbzenker Apr 20 '17 at 01:15
  • 1
    I don't think *fitting* is the correct term, here (it suggests you want to determine the parameters - mu and sigma - of the bell curve from data). The question itself sounds more like you want to *transform* the distribution. – MB-F Apr 20 '17 at 10:38

1 Answers1

2

One tool you might consider is the Box-Cox transformation. The implementation in scipy is scipy.stats.boxcox.

Here's an example:

import numpy as np
from scipy.stats import boxcox, gamma
import matplotlib.pyplot as plt


# Generate a random sample that is not from a normal distribution.
np.random.seed(1234)
x = gamma.rvs(1.5, size=250)

# Transform the data.
y, lam = boxcox(x)

# Plot the histograms.
plt.subplot(2, 1, 1)
plt.hist(x, bins=21, rwidth=0.9)
plt.title('Histogram of Original Data')
plt.subplot(2, 1, 2)
plt.hist(y, bins=21, rwidth=0.9)
plt.title('Histogram After Box-Cox Transformation\n($\\lambda$ = %.4g)' % lam)
plt.tight_layout()
plt.show()

plot

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214