In the standard scikit-learn implementation of Gaussian-Process Regression (GPR), the hyper-parameters (of the kernel) are chosen based on the training set.
Is there an easy to use implementation of GPR (in python), where the hyperparemeters (of the kernel) are chosen based on a separate validation set? Or cross-validation would also be a nice alternative to find suitable hyperparameters (that are optimized to perform well on mutliple train-val splits). (I would prefer a solution that builds on the scikit-learn GPR.)
In detail: a set of hyperparameters theta should be found, that performs well in the following metric: Calculate the posterior GP based on the training data (given the prior GP with hyperparameters theta). Then evaluate the negative log likelihood of the validation data with respect to the posterior. This negative log likelihood should be minimal for theta.
In other words I want to find theta such "P[ valData | trainData, theta ]" is maximal. A non-exact approximation that might be sufficient would be to find theta such that sum_i log(P[ valData_i | trainData, theta ] is maximal, where P[ valData_i | trainData, theta ] is the Gaussian marginal posterior density of a validation data-point valData_i given the training-data set given the prior GP with hyperparameters theta.Edit: Since P[ valData | trainData, theta ] has been implemented recently (see my answer), the easier to implement approximation of P[ valData | trainData, theta ] is not needed.