0

I am building a Categorization model using a Gaussian Process with Noise - I don't understand why it is failing with a Value Error

I have a data set with about 10% labeled as a target of 1 or 0. I am trying to predict the probability of the other 90% to be 1.

I have used sklearn to split the labeled set into a training and a test set.

X is the feature_training and it is an np.array X.shape (54,9)

y is feature_target and it is is an np.array y.shape (54,1)

both are float and noise is calculated as:

dy = 0.5 + 1.0 * np.random.random(y.shape)
noise = np.random.normal(0, dy)
y = (y + noise)

y.shape
(54,1)

nugget is of type numpy.ndarray and shape (54,1)

In the gaussian process model I am using -

gp = GaussianProcess(corr='squared_exponential', theta0=1e-1,
                 thetaL=1e-3, thetaU=1,
                 nugget=(dy / y) ** 2,
                 random_start=100)

gp.fit(X, y) 

fails because: ValueError: nugget must be either a scalar or array of length n_samples

It seems like X, y, nugget are all of type numpy.ndarray and of the correct shape. I think the nugget is of length n_samples (54) so it should be of equivalent length.

Is there something obvious that I am missing?

FJB
  • 1
  • 1

1 Answers1

0

Your y needs to be a vector of shape (n,) not an a array of shape (n,1). You can fix this with

y = y.reshape((len(y),)
maxymoo
  • 35,286
  • 11
  • 92
  • 119