0

Hi I am looking for a way to sample a subset of data from a larger pool of data.

For example say I have 4000 entries which have a value X uniformly distributed between 0 and 1.5.

import matplotlib.pyplot as plt
import random
randomlist = []
for i in range(0, 4000):
    n = random.uniform(0, 1.5)
    randomlist.append(n)
plt.hist(randomlist)

I would like to take 400 samples in such a way that the resulting histogram of value X is roughly sigmoidal between 0 and 1.5.

For instance I can generate what the desired sigmoid would look like:

import matplotlib.pyplot as plt
import numpy as np
import math
  
x = np.linspace(0, 1.5, 400)
z = 1/(1 + np.exp(10*(-x+1.0)))
  
plt.plot(x, z)
plt.xlabel("x")
plt.ylabel("Sigmoid(X)")
  
plt.show()

However I do not know how to implement this practically to achieve my sampling.

I may be thinking about this in an overly complex way - happy to have suggestions of another way of achieving this. I have tried random sampling of Y samples from buckets however I would prefer a more generalizable method.

  • When you say 'sigmoidal histogram', are you implying a cumulative histogram? If so, this makes sense. If not, this doesn't make much sense. – Reinderien Mar 08 '23 at 02:20

0 Answers0