0

I need to generate a skewed distribution for a given range (eg: low= 1, high =10) and size = 100. stats.skewnorm can produce a skewed distribution, but it does not take in the parameter for lower and upper bound. Can someone please help me with it.

def skewed_data(low, high, number):
    skew_parameter = 5
    data = skewnorm.rvs(skew_parameter, loc=1, scale=1, size=number)
    data = np.round(data,0).astype(int)
    return data

result_data = skewed_data(1, 10,100)

bins = np.linspace(0, 10, 10)
plt.hist(response, bins, alpha=1, label='histogram')
plt.legend(loc='upper right')
plt.show()

the histogram plot of the result data is shown below (all the result data are between 1 to 5, whereas I would like it to be distributed between 1 to 10 and right skewed.

enter image description here

S.D
  • 21
  • 6
  • what does your output currently look like? and what does it need to look like. Please provide an example of both. – Taylor Cochran Oct 06 '21 at 18:15
  • @TaylorCochran: the resulted data is in between 0 to 3. I like it to be distributed between 1 and 10 and right skewed. – S.D Oct 06 '21 at 18:24
  • @Taylor: I have edited the question for clarity. I changed loc = 1 now, the result is in between 1 and 5, whereas I want it to be distributed between 1 and 10 (for example, or 10 to 100) as start and end is specified in the argument. – S.D Oct 06 '21 at 18:33
  • where is distribution defined? – Taylor Cochran Oct 06 '21 at 18:37
  • @TaylorCochran: skew_parameter defines the skewness. But I am not sure how to define the distribution. – S.D Oct 06 '21 at 18:45
  • 1
    so what I'm getting at here is that you have a number of variables that are not defined in your example. This makes reproducing the issue you are having harder. i.e. `distribution` is being passed into your func `np.round()` and neither is `response` defined in this example – Taylor Cochran Oct 06 '21 at 18:52
  • @TaylorCochran: Oh, I am sorry for the confusion, the "distribution" in np.round() is incorrect. It is actually "data", so I am rounding the "data" generated in the earlier step. I have now corrected it in the question. – S.D Oct 06 '21 at 18:55
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/237904/discussion-between-s-d-and-taylor-cochran). – S.D Oct 06 '21 at 19:03

1 Answers1

2

There are lots of skewed distributions, an infinite number in fact. If you want one with guarantees on the min and max, a simple and easy-to-generate possibility is the triangular distribution. Triangles are parameterized by specifying the min, max, and mode (most likely outcome). Any value for mode other than the mid-point between min and max will yield skewness. If you want results that are right-skewed between 1 and 10, use values like 1, 10, and 3 as the min, max, and mode, respectively.

Triangular distributions are available through the random module or numpy.

If the triangular distribution is too piecewise linear and pointy for you, another alternative might be the beta distribution. Betas are defined on the range [0,1], so you would need to scale and shift to get other ranges.

The beta distribution is available through numpy/scipy.

pjs
  • 18,696
  • 4
  • 27
  • 56