Long time reader/code parasite, first time question asker!
I'm running into a relatively mysterious error while kriging a large point dataset that I'd love any insight people might be able to bring.
In essence I'm trying to interpolate bathymetry raster surfaces for a series of lakes that I have irregular depth soundings for. These depth soundings were sourced from a PDF scan of a paper map that was digitized in QGIS, then exported for analysis into R.
Using the following code I've managed to successfully create bathymetric rasters for several of the smaller lakes that I'm interested in, but the largest lake runs into a mysterious problem I can't seem to figure out - everything prior to the actual parallelized foreach statement that I run the kriging within works as expected, but when the foreach statement the parallel process works within runs it produces no output. The line after the foreach statement fails with 'Error in do.call(ok_test, out) : object 'out' not found'.
When I first ran into this problem I checked the warnings() output produced the warning 'Covariance matrix singular at location etc, skipping' which prior posts here and elsewhere indicated was a sign of duplicate points - I ran the data through a number of different duplicate detection/removal processes in R (see the below zerodist() code) and QGIS and it didn't seem to solve the problem. The warnings went away, thankfully, but there's still no output created. 'zerodist()' also breaks the sp depth data object for some reason, even though the data is formatted in the same way as the other lake datasets and the entire workflow laid out below is repeated exactly for those other lakes and it works in those cases.
I ended up essentially re-creating the depth sounding dataset for this large lake using a QGIS workflow that would remove any duplicate points or even closely coincident points and the same problem arises. The variogram model fit for this lake and the other lakes seems more than adequate and all of the constituent files seem to behave as expected (no broken CRS, everything is in a Cartesian plane, etc) so I'm stumped as to what might be causing this.
Happy to answer any questions you might have! I'm hoping it's something simple and dumb that I've overlooked, as I get the feeling this is a bit of a Frankenstein's monster-esque solution to the problem of kriging a large point dataset.
I've uploaded the file I've used here for reproducibility's sake - you should be able to unpack it to a directory of your choice, setwd() and run the code I've laid out here.
library(rgdal)
library(tidyverse)
library(sf)
library(gstat)
library(sp)
library(maptools)
library(rgeos)
library(automap)
library(parallel)
library(doParallel)
setwd()
# This initial setup is a little complicated because the makegrid() function
# requires a spatial object and my brain thinks best in tidyverse
bc_env_albs <- CRS('+proj=aea +lat_1=50 +lat_2=58.5 +lat_0=45 +lon_0=-126
+x_0=1000000 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs')
ok_template <- readOGR('ok_boundary.shp')
ok_template_sf <- read_sf('ok_boundary.shp')
ok_grid <- makegrid(ok_template, cellsize = 100)
ok_grid_tib <- as_tibble(ok_grid)
ok_grid_sf <- st_as_sf(ok_grid_tib, coords = c('x1', 'x2'),
crs = bc_env_albs)
# This section attempts to remove points in the interpolation grid that are outside the
# lake boundary - it makes the grid irregular in dimensions but it functions just fine in
# the other lakes
ok_grid_f <- st_intersection(ok_template_sf, ok_grid_sf)
ok_grid_f <- ok_grid_f %>% mutate(
r_num = row_number())
###
# Can either read CSV directly and convert to spatial object or read shapefile
#ok_depth_csv <- read_csv('ok_depth_buffer_experiment.csv')
ok_depth <- readOGR('ok_buffer_experiment.shp')
#ok_depth <- SpatialPointsDataFrame(ok_depth_csv[,1:2],
#proj4string = bc_env_albs,
#ok_depth_csv)
#ok_depth <- ok_depth[-zerodist(ok_depth)[,1],] # This code breaks the sp object
ok_depth_var <- automap::autofitVariogram(depth_m~1, ok_depth)$var_model
###
# This section attempts to split the large grid dataset into smaller sections
# based on row ID that can be iterated through and interpolated.
ok_gr <- ok_grid_f %>%
mutate(group = cut_width(r_num, width = 1000)) %>%
group_by(group)
ok_list <- list()
ok_list <- ok_gr %>% group_split(group)
ok_upper_parts <- lapply(ok_list, as_Spatial)
###
n_cores = 6
my_cluster <- parallel::makeCluster(
n_cores,
type = "PSOCK"
)
doParallel::registerDoParallel((cl = my_cluster))
###
out <- foreach(
i = ok_upper_parts,
.combine = 'c',
.packages = c('sp',
'gstat')
) %dopar% {
krige(formula = depth_m~1,
ok_depth,
i,
model = ok_depth_var)
}
test <- do.call(rbind, out)
EDIT: Interesting update from further experimentation. I tried to modify the krige() function to conduct local neighbourhood kriging instead of ordinary kriging, as this answer suggests that there's a mathematical component to kriging that increases in computational requirements exponentially in size. Modifying the krige() argument in the following way produces an actual output, though the result is wildly inaccurate, possibly due to the size of the neighbourhood? May also be worthwhile to play with the size of the point partitions in 'ok_upper_parts' to find a sweet spot between the size of the data being kriged and the neighbourhood window specified.
out <- foreach(
i = ok_upper_parts,
.combine = 'c',
.packages = c('sp',
'gstat')
) %dopar% {
krige(formula = depth_m~1,
ok_depth,
i,
model = ok_depth_var,
nmin = 500,
nmax = 2000)
}
test <- do.call(rbind, out)