How slow is too slow when kriging with gstat in R

Question

I am trying to use the krige function in the gstat package of R to interpolate some spatial ocean depth data in R. I am finding for more than about ~1000 points, the function starts taking unreasonable amounts of time to finish (i.e., hours to days to hasn't ever finished). Is this normal or am I doing something wrong? I am particularly concerned because my eventual goal is to do spatio-temporal kriging of a very large dataset (>30,000 data points) and I am worried that it just won't be feasible given these run times.

I am running gstat-1.1-3 and R-3.3.2. Below is the code I am running:

library(sp); library(raster); library(gstat)
v.utm # SpatialPointsDataFrame with >30,000 points

# Remove points with identical positons
zd = zerodist(v.utm)
nzd = v.utm[-zd[,1],] # Layer with no identical positions

# Make a raster layer covering point layer
resolution=1e4
e = extent(as.matrix(v.utm@coords))+resolution
r = raster(e,resolution=resolution) 
proj4string(r) = proj4string(v.utm)

# r is a 181x157 raster

# Fit variogram
fv = fit.variogram(variogram(AVGDEPTH~1, nzd),model=vgm(6000,"Exp",1,5e5,1))

# Krige on random sample of 500 points - works fine
size=500
ss=nzd[sample.int(nrow(nzd),size),]
depth.krig = krige(AVGDEPTH~1,ss,as(r,"SpatialPixelsDataFrame"),
               model=depth.fit)

# Krige on random sample of 5000 points - never seems to end
size=5000
ss=nzd[sample.int(nrow(nzd),size),]
depth.krig = krige(AVGDEPTH~1,ss,as(r,"SpatialPixelsDataFrame"),
               model=depth.fit)

score 3 · Answer 1 · answered Dec 10 '16 at 13:32

3

The complexity of the choleski decomposition (or similar) is O(n^3), meaning that if you multiply the number of points by 10, the time it will take increases with a factor 1000. There are two ways out of this problem, at least for as far as gstat is concerned:

install an optimized version of BLAS (eg OpenBLAS, or MKL) - this does not solve the O(n^3) problem, but may speed up maximally a factor n with n the number of cores available
Avoid decomposing the full covariance matrix by choosing local neighbourhoods (arguments maxdist and/or nmax)

answered Dec 10 '16 at 13:32

Edzer Pebesma

3,814
16
26

Thanks for the helpful response. I set nmax to a reasonable amount and that made things very quick. I wonder if there is something similar to speed up variogramST? I have tried setting the twindow and cutoff arguments, but variogram calculation is still very slow. The exact code I am using is: vst = variogramST(BOTTEMP~AVGDEPTH+lat, data=v,cutoff=7e5, twindow= 5*365.25,tlags = (0:15)*365.25,tunit="days") – user3004015 Dec 12 '16 at 18:49
variogram computation is an O(n^2) operation. – Edzer Pebesma Dec 12 '16 at 19:38
I guess my question is do the `cutoff` and `twindow` arguments cut down on the number of calculations in the same sense that the nmax and maxdist arguments do for `krige`? Also, I don't think I was using the `twindow` argument correctly in the code I posted. Can you confirm that the units of `twindow` are number of observations and not a time unit? Thanks! – user3004015 Dec 13 '16 at 15:05
In a similar way, yes. – Edzer Pebesma Dec 14 '16 at 22:33

score 1 · Answer 2 · answered Jul 03 '20 at 07:02

A much faster alternative to kriging for large datasets is griddify in the marmap package. It took me a while to find this, but it works well. It uses bilinear interpolation and although it is designed for bathymetric maps, it works with any xyz data.

How slow is too slow when kriging with gstat in R

2 Answers2

Linked