The issue of insufficient memory when creating a Gaussian process model due to the excessively large size of the dataset

Question

I was informed that there is not enough memory when creating the following Gaussian process model, and I would like to know if there is a feature in GPflow that allows loading data in batches instead of reading all the data at once.

Try this code

data = (X, Y) # size approximate to 1e6
gpflow.models.VGP(
    data,
    kernel=gpflow.kernels.SquaredExponential(),
    likelihood=gpflow.likelihoods.Bernoulli(),
),

encounter OOM

score 0 · Answer 1 · answered Mar 31 '23 at 12:46

If you use the SVGP model instead of VGP, you can train the model on data loaded in batches ("mini-batch"). This is demonstrated in this notebook: https://gpflow.github.io/GPflow/2.7.1/notebooks/advanced/gps_for_big_data.html

If you're just past the edge of "not enough memory" there might be some other ways of computing part by part (though I can't give any advice on how you would do that), but with a VGP model for N data points in the end you will still need to allocate O(N^2) memory.

The issue of insufficient memory when creating a Gaussian process model due to the excessively large size of the dataset

1 Answers1