2

I am trying to use the krige function in the gstat package of R to interpolate some spatial ocean depth data in R. I am finding for more than about ~1000 points, the function starts taking unreasonable amounts of time to finish (i.e., hours to days to hasn't ever finished). Is this normal or am I doing something wrong? I am particularly concerned because my eventual goal is to do spatio-temporal kriging of a very large dataset (>30,000 data points) and I am worried that it just won't be feasible given these run times.

I am running gstat-1.1-3 and R-3.3.2. Below is the code I am running:

library(sp); library(raster); library(gstat)
v.utm # SpatialPointsDataFrame with >30,000 points

# Remove points with identical positons
zd = zerodist(v.utm)
nzd = v.utm[-zd[,1],] # Layer with no identical positions

# Make a raster layer covering point layer
resolution=1e4
e = extent(as.matrix(v.utm@coords))+resolution
r = raster(e,resolution=resolution) 
proj4string(r) = proj4string(v.utm)

# r is a 181x157 raster

# Fit variogram
fv = fit.variogram(variogram(AVGDEPTH~1, nzd),model=vgm(6000,"Exp",1,5e5,1))

# Krige on random sample of 500 points - works fine
size=500
ss=nzd[sample.int(nrow(nzd),size),]
depth.krig = krige(AVGDEPTH~1,ss,as(r,"SpatialPixelsDataFrame"),
               model=depth.fit)

# Krige on random sample of 5000 points - never seems to end
size=5000
ss=nzd[sample.int(nrow(nzd),size),]
depth.krig = krige(AVGDEPTH~1,ss,as(r,"SpatialPixelsDataFrame"),
               model=depth.fit)
user3004015
  • 617
  • 1
  • 7
  • 14

2 Answers2

3

The complexity of the choleski decomposition (or similar) is O(n^3), meaning that if you multiply the number of points by 10, the time it will take increases with a factor 1000. There are two ways out of this problem, at least for as far as gstat is concerned:

  1. install an optimized version of BLAS (eg OpenBLAS, or MKL) - this does not solve the O(n^3) problem, but may speed up maximally a factor n with n the number of cores available
  2. Avoid decomposing the full covariance matrix by choosing local neighbourhoods (arguments maxdist and/or nmax)
Edzer Pebesma
  • 3,814
  • 16
  • 26
  • Thanks for the helpful response. I set nmax to a reasonable amount and that made things very quick. I wonder if there is something similar to speed up variogramST? I have tried setting the twindow and cutoff arguments, but variogram calculation is still very slow. The exact code I am using is: vst = variogramST(BOTTEMP~AVGDEPTH+lat, data=v,cutoff=7e5, twindow= 5*365.25,tlags = (0:15)*365.25,tunit="days") – user3004015 Dec 12 '16 at 18:49
  • variogram computation is an O(n^2) operation. – Edzer Pebesma Dec 12 '16 at 19:38
  • I guess my question is do the `cutoff` and `twindow` arguments cut down on the number of calculations in the same sense that the nmax and maxdist arguments do for `krige`? Also, I don't think I was using the `twindow` argument correctly in the code I posted. Can you confirm that the units of `twindow` are number of observations and not a time unit? Thanks! – user3004015 Dec 13 '16 at 15:05
  • In a similar way, yes. – Edzer Pebesma Dec 14 '16 at 22:33
1

A much faster alternative to kriging for large datasets is griddify in the marmap package. It took me a while to find this, but it works well. It uses bilinear interpolation and although it is designed for bathymetric maps, it works with any xyz data.

Barbara
  • 188
  • 1
  • 8