Error in using Gaussian Process regression in sklearn python

Question

I am started learning python and trying to implement Gaussian regression using Sklearn library. I tried to follow the examples available here for my own data points. However, I am getting the following example when I am trying to run y_pred, std = model.predict(X_te, return_std=True) this line of code of my problem. The error I got 'XA and XB must have the same number of columns (i.e. feature dimension.)'.

I don't know where I made my mistake, please help and thanks in advance.

The sample of input and output data is given as follows

X_tr= [10.8204  7.67418 7.83013 8.30996 8.1567  6.94831 14.8673 7.69338 7.67702 12.7542 11.847] 
y_tr= [1965.21  854.386 909.126 1094.06 1012.6  607.299 2294.55 866.316 822.948 2255.32 2124.67]
X_te= [7.62022  13.1943 7.76752 8.36949 7.86459 7.16032 12.7035 8.99822 6.32853 9.22345 11.4751]

X_tr, y_tr and X_te are the training data points and are reshape values and have a type of 'Array of float64'

Here is a sample of my code:

import sklearn.gaussian_process as gp

kernel = gp.kernels.ConstantKernel(1.0, (1e-1, 1e3)) * gp.kernels.RBF(10.0, (1e-3, 1e3))

model = gp.GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10, alpha=0.1, normalize_y=True)

# data reshape
X_tr = X_tr.values.reshape(1,-1)
y_tr = y_tr.values.reshape(1,-1)

model.fit(X_tr, y_tr)
params = model.kernel_.get_params()

X_te = X_te.values.reshape(1,-1)

y_pred, std = model.predict(X_te, return_std=True)

score 0 · Accepted Answer · answered Jul 05 '20 at 14:30

This works. I changed your data from pandas to numpy arrays and fixed your reshapeing issues from which your error resulted.

import numpy as np

X_tr= np.array([10.8204,  7.67418, 7.83013, 8.30996, 8.1567,  6.94831, 14.8673, 7.69338, 7.67702, 12.7542, 11.847])
y_tr= np.array([1965.21,  854.386, 909.126, 1094.06, 1012.6,  607.299, 2294.55, 866.316, 822.948, 2255.32, 2124.67])
X_te= np.array([7.62022, 13.1943, 7.76752, 8.36949, 7.86459, 7.16032, 12.7035, 8.99822, 6.32853, 9.22345, 11.4751])

import sklearn.gaussian_process as gp

kernel = gp.kernels.ConstantKernel(1.0, (1e-1, 1e3)) * gp.kernels.RBF(10.0, (1e-3, 1e3))

model = gp.GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10, alpha=0.1, normalize_y=True)

# data reshape
X_tr = X_tr.reshape(-1,1)
y_tr = y_tr

model.fit(X_tr, y_tr)
params = model.kernel_.get_params()

X_te = X_te.reshape(-1,1)

y_pred, std = model.predict(X_te, return_std=True)

Thank you it works but it does not give the expected result. Could you please suggest how to do hyperparameters optimisation for this problem. It seems this code does not do optimisation. — Ankita, Jul 05 '20 at 15:03
for hyperparameter tuning you can use `GridSearchCV`, have a look at https://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_digits.html or https://stackoverflow.com/questions/30102973/how-to-get-best-estimator-on-gridsearchcv-random-forest-classifier-scikit. But this is a different topic. If that interests you, ask another question. — pythonic833, Jul 05 '20 at 15:06

Error in using Gaussian Process regression in sklearn python

1 Answers1