Hi I am looking for a way to sample a subset of data from a larger pool of data.
For example say I have 4000 entries which have a value X uniformly distributed between 0 and 1.5.
import matplotlib.pyplot as plt
import random
randomlist = []
for i in range(0, 4000):
n = random.uniform(0, 1.5)
randomlist.append(n)
plt.hist(randomlist)
I would like to take 400 samples in such a way that the resulting histogram of value X is roughly sigmoidal between 0 and 1.5.
For instance I can generate what the desired sigmoid would look like:
import matplotlib.pyplot as plt
import numpy as np
import math
x = np.linspace(0, 1.5, 400)
z = 1/(1 + np.exp(10*(-x+1.0)))
plt.plot(x, z)
plt.xlabel("x")
plt.ylabel("Sigmoid(X)")
plt.show()
However I do not know how to implement this practically to achieve my sampling.
I may be thinking about this in an overly complex way - happy to have suggestions of another way of achieving this. I have tried random sampling of Y samples from buckets however I would prefer a more generalizable method.