15

So say I'm trying to create a 100-sample dataset that follows a certain line, maybe 2x+2. And I want the values on my X-axis to range from 0-1000. To do this, I use the following.

X = np.random.random(100,1) * 1000
Y = (2*X) + 2
data = np.hstack(X,Y)

The hstack gives me the array with corresponding x and y values. That part works. But if I want to inject noise into it in order to scatter the datapoints further away from that 2x+2 line...that's what I can't figure out.

Say for example, I want that Y array to have a standard deviation of 20. How would I inject that noise into the y values?

MP12389
  • 305
  • 1
  • 3
  • 10

2 Answers2

20

Maybe I'm missing something, but have you tried adding numpy.random.normal(scale=20,size=100) to Y? You can even write

Y=numpy.random.normal(2*X+2,20)

and do it all at once (and without repeating the array size).

Davis Herring
  • 36,443
  • 4
  • 48
  • 76
5

To simulate noise use a normally distributed random number generator like np.random.randn.

Is this what you are trying to do:

X = np.linspace(0, 1000, 100)
Y = (2*X) + 2 + 20*np.random.randn(100)
data = np.hstack((X.reshape(100,1),Y.reshape(100,1)))

enter image description here

Bill
  • 10,323
  • 10
  • 62
  • 85