I have a dataset of an arbitrary number of elements (say, 10000) which follows lognormal distribution over a certain range of values (say, between 1 and 500). I successfully fit a distribution with the powerlaw
module in Python. Then I need to generate values from this distribution in a way that gives me values within the given bounds (between 1 and 500 of course with reasonable tolerance) and as many as the input dataset. I have tried the generators included in the powerlaw
module itself and while they work, the values generated far exceed the maximum I can accept - I have a maximum of around 500 and the synthetic dataset routinely hits 6600.
I have attempted generating lognormal by plugging a product of truncated normal to exp function:
def generate_lognormal(xmin, xmax,mu, sigma, n):
from scipy.stats import truncnorm
import numpy as np
# the idea is: get items from truncated normal, then exponentiate it.
minBound = (np.log(xmin) - mu)/sigma # get the "x" value for the desired lower bound
maxBound = (np.log(xmax) - mu)/sigma # get the "x" value for the desired upper bound
rand = truncnorm.rvs(minBound, maxBound, loc=0, scale=1, size=n, random_state=None) # Generate from truncated normal
return exp(mu+sigma*rand) # return lognormal
But here I faced a different problem - most of the time the generated data only reach around half of the desired range - the largest values end up at 200-300 with zero cases closer to 500. In fact both scenarios (overshoot and undershoot) can happen with this code. Is there any way of generating values from lognormal (and power law) distributions within bounds which will be stable between iterations?