Missing output while kriging large point dataset

Question

Long time reader/code parasite, first time question asker!

I'm running into a relatively mysterious error while kriging a large point dataset that I'd love any insight people might be able to bring.

In essence I'm trying to interpolate bathymetry raster surfaces for a series of lakes that I have irregular depth soundings for. These depth soundings were sourced from a PDF scan of a paper map that was digitized in QGIS, then exported for analysis into R.

Using the following code I've managed to successfully create bathymetric rasters for several of the smaller lakes that I'm interested in, but the largest lake runs into a mysterious problem I can't seem to figure out - everything prior to the actual parallelized foreach statement that I run the kriging within works as expected, but when the foreach statement the parallel process works within runs it produces no output. The line after the foreach statement fails with 'Error in do.call(ok_test, out) : object 'out' not found'.

When I first ran into this problem I checked the warnings() output produced the warning 'Covariance matrix singular at location etc, skipping' which prior posts here and elsewhere indicated was a sign of duplicate points - I ran the data through a number of different duplicate detection/removal processes in R (see the below zerodist() code) and QGIS and it didn't seem to solve the problem. The warnings went away, thankfully, but there's still no output created. 'zerodist()' also breaks the sp depth data object for some reason, even though the data is formatted in the same way as the other lake datasets and the entire workflow laid out below is repeated exactly for those other lakes and it works in those cases.

I ended up essentially re-creating the depth sounding dataset for this large lake using a QGIS workflow that would remove any duplicate points or even closely coincident points and the same problem arises. The variogram model fit for this lake and the other lakes seems more than adequate and all of the constituent files seem to behave as expected (no broken CRS, everything is in a Cartesian plane, etc) so I'm stumped as to what might be causing this.

Happy to answer any questions you might have! I'm hoping it's something simple and dumb that I've overlooked, as I get the feeling this is a bit of a Frankenstein's monster-esque solution to the problem of kriging a large point dataset.

I've uploaded the file I've used here for reproducibility's sake - you should be able to unpack it to a directory of your choice, setwd() and run the code I've laid out here.

library(rgdal)
library(tidyverse)
library(sf)
library(gstat)
library(sp)
library(maptools)
library(rgeos)
library(automap)
library(parallel)
library(doParallel)

setwd()

# This initial setup is a little complicated because the makegrid() function 
# requires a spatial object and my brain thinks best in tidyverse

bc_env_albs <- CRS('+proj=aea +lat_1=50 +lat_2=58.5 +lat_0=45 +lon_0=-126 
+x_0=1000000 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs')

ok_template <- readOGR('ok_boundary.shp')

ok_template_sf <- read_sf('ok_boundary.shp')

ok_grid <- makegrid(ok_template, cellsize = 100)

ok_grid_tib <- as_tibble(ok_grid)

ok_grid_sf <- st_as_sf(ok_grid_tib, coords = c('x1', 'x2'),
                          crs = bc_env_albs)

# This section attempts to remove points in the interpolation grid that are outside the 
# lake boundary - it makes the grid irregular in dimensions but it functions just fine in 
# the other lakes

ok_grid_f <- st_intersection(ok_template_sf, ok_grid_sf)

ok_grid_f <- ok_grid_f %>% mutate(
  r_num = row_number())

###

# Can either read CSV directly and convert to spatial object or read shapefile

#ok_depth_csv <- read_csv('ok_depth_buffer_experiment.csv')

ok_depth <- readOGR('ok_buffer_experiment.shp')

#ok_depth <- SpatialPointsDataFrame(ok_depth_csv[,1:2], 
                                      #proj4string = bc_env_albs,
                                      #ok_depth_csv)

#ok_depth <- ok_depth[-zerodist(ok_depth)[,1],] # This code breaks the sp object


ok_depth_var <- automap::autofitVariogram(depth_m~1, ok_depth)$var_model

###

# This section attempts to split the large grid dataset into smaller sections 
# based on row ID that can be iterated through and interpolated.

ok_gr <- ok_grid_f %>% 
  mutate(group = cut_width(r_num, width = 1000)) %>% 
  group_by(group)

ok_list <- list()

ok_list <- ok_gr %>% group_split(group)

ok_upper_parts <- lapply(ok_list, as_Spatial)

###

n_cores = 6

my_cluster <- parallel::makeCluster(
  n_cores,
  type = "PSOCK"
)

doParallel::registerDoParallel((cl = my_cluster))

###

out <- foreach(
  i = ok_upper_parts,
  .combine = 'c',
  .packages = c('sp',
                'gstat')
) %dopar% {
  krige(formula = depth_m~1, 
        ok_depth, 
        i,
        model = ok_depth_var)
}

test <- do.call(rbind, out)

EDIT: Interesting update from further experimentation. I tried to modify the krige() function to conduct local neighbourhood kriging instead of ordinary kriging, as this answer suggests that there's a mathematical component to kriging that increases in computational requirements exponentially in size. Modifying the krige() argument in the following way produces an actual output, though the result is wildly inaccurate, possibly due to the size of the neighbourhood? May also be worthwhile to play with the size of the point partitions in 'ok_upper_parts' to find a sweet spot between the size of the data being kriged and the neighbourhood window specified.

out <- foreach(
  i = ok_upper_parts,
  .combine = 'c',
  .packages = c('sp',
                'gstat')
) %dopar% {
  krige(formula = depth_m~1, 
        ok_depth, 
        i,
        model = ok_depth_var,
        nmin = 500,
        nmax = 2000)
}

test <- do.call(rbind, out)

If you get `Error in do.call(ok_test, out) : object 'out' not found`, then it means the `foreach()` call failed, and the error it reports will be one step closer to what the problem is. My $.02 — HenrikB, Jul 12 '22 at 10:34
Thanks for the two cents! As far as I can tell it doesn't produce any errors or warnings now that I've recreated the dataset to remove duplicate points - until it 'finishes' running it performs exactly as in the smaller datasets the workflow does complete properly in. It's worth noting as well that the above script was my second try at parallelizing this kriging task: the first time I adapted Guzman's example here (https://gis.stackexchange.com/questions/237672/how-to-achieve-parallel-kriging-in-r-to-speed-up-the-process) and it produced the exact same 'blank' output. — Steven Brownlee, Jul 12 '22 at 16:50

score 0 · Answer 1 · answered Jul 21 '22 at 00:21

I recognize that this is not the toolset that you are working with, but I ran your data through a Java application called The Simple Volumetric Model and confirmed that your data was just fine as presented. I see that there are 20719 soundings in your data set, which doesn't strike me as a particularly large number (unless you happen to be the person who has to digitize them by hand). I compute a lake volume of 24.64 km^3, an average depth of 69.58 meters, and a mean sounding spacing of 302.3 meters.

The purpose of the SVM is to model how a lake volume changes in response to lowering water levels (in times of drought, etc.). It does produce some analysis artifacts including contour shapefiles and raster grids. It uses Natural Neighbor Interpolation instead of Kriging. For this test, I specified a 50 meter grid cell spacing. If you got really stuck, you could try running your data through SVM and then importing the output products into conventional GIS tools.

For your reference, here are a couple of pictures.

Missing output while kriging large point dataset

1 Answers1