I am new to Scikit Learn and am trying to learn how to use Gaussian process regression.
I am attempting to use a data-set that has repeated numbers, for example:
array(x,y) = [[10, 10, 20, 20, 15, 17], [30, 40, 50, 60, 50, 40]]
When following the documentation for Gaussian process regression with Scikit Learn, i'm encountering the following problem:
C:\Python27\lib\site-packages\sklearn\gaussian_process\gaussian_process.pyc in fit(self, X, y)
298 if (np.min(np.sum(D, axis=1)) == 0.
299 and self.corr != correlation.pure_nugget):
--> 300 raise Exception("Multiple input features cannot have the same"
301 " target value.")
302
Exception: Multiple input features cannot have the same target value.
This is my code:
import numpy as np
from matplotlib import pyplot as plt
from sklearn.gaussian_process import GaussianProcess
#Import CSV file
dataset = np.loadtxt(open("data.csv","rb"),delimiter=",",skiprows=1)
#Separate CSV file columns into X,Y
X = np.atleast_2d(dataset[:,0]).T
y = dataset[:,1].ravel()
#set values for x-axis plot
min = np.amin(dataset[:,0])
max = np.amax(dataset[:,0])
x = np.atleast_2d(np.linspace(min, max, 1000)).T
# Instanciate a Gaussian Process model
gp = GaussianProcess(corr='cubic', theta0=1e-2, thetaL=1e-4, thetaU=1e-1,
random_start=100)
# Fit to data using Maximum Likelihood Estimation of the parameters
gp.fit(X, y)
Is it possible to have repeated values in the input data-set? If so, how do I go about doing this?