I would like to use mapply on my custom function which relies on two sf object inputs. Here is a link to my GitHub repository with code and data for a reproducible example: https://github.com/msghankinson/sf_mapply.
I am trying to intersect a shapefile of circles (buffers drawn around points) with a shapefile of Census blocks from California (n = 710,000). The intersection requires the slow st_intersection command because I need to calculate what share of the block is overlapped by the circle (blocks with >=50% of overlap are kept, others are dropped). Because of the shapefile sizes, using st_intersection on the entire state is unwieldy. To improve speed, I plan to both run the intersection on an HPC cluster and break the statewide intersections into parallel tasks of intersections at the county level.
Here is my function:
county_func <- function(x, y) { # (x = circles, y = centroids/points)
y_id <- y %>% # keep points only as an index for later
st_drop_geometry()
x_int <- st_intersects(x, blocks) # intersect circles with blocks to build index of overlap
ints_holder <- data.frame()
for(i in 1:nrow(x_int)){ # for each circle
blocks_int <- subset(blocks, as.numeric(rownames(blocks)) %in% x_int[[i]]) # subset blocks that intersect
blocks_int$hud_id <- y_id[i, 1] # add point id to these blocks for merge later
ints_holder <- rbind(ints_holder, blocks_int)
}
x_blocks <- st_intersection(x, ints_holder) %>% # intersection between circles and blocks they intersect
mutate(intersect_area = st_area(.)) %>% # calculate intersect area
dplyr::select(GEOID10, intersect_area) # drop irrelevant data
return(x_blocks)
}
As annotated in my R script, the function is successful when broken into individual commands and run on two dataframe inputs (circles and points). However, to run the function on the counties in parallel via the cluster, I need to make the function compatible with the parallel library, meaning functions like mcmapply/mapply. To test compatibility, I formatted the circle shapefile into a list of circles, grouped at the county level. But when I run the function on even one county drawn from the list, I receive the following error:
Error in UseMethod("st_drop_geometry") : no applicable method for 'st_drop_geometry' applied to an object of class "character"
The error persists even when I remove all character variables from both inputs. As I understand, the apply functions force objects into matrix form, which has compatibility issues with sf objects. But I do not know how else to run the function in parallel via the cluster outside of the parallel library and mcmapply family of commands. Any advice at all would be greatly appreciated.
For reference:
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.0.9 sf_1.0-7
loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 rstudioapi_0.13 magrittr_2.0.3 units_0.7-2 tidyselect_1.1.2 R6_2.5.1
[7] rlang_1.0.2 fansi_1.0.3 s2_1.0.7 stringr_1.4.0 wk_0.5.0 tools_4.1.0
[13] grid_4.1.0 KernSmooth_2.23-20 utf8_1.2.2 cli_3.3.0 e1071_1.7-9 DBI_1.1.1
[19] ellipsis_0.3.2 class_7.3-19 assertthat_0.2.1 tibble_3.1.7 lifecycle_1.0.1 crayon_1.5.1
[25] purrr_0.3.4 vctrs_0.4.1 glue_1.6.2 stringi_1.7.6 proxy_0.4-26 compiler_4.1.0
[31] pillar_1.7.0 generics_0.1.2 classInt_0.4-3 pkgconfig_2.0.3
> sf::sf_extSoftVersion()
GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H PROJ
"3.9.1" "3.4.0" "8.1.1" "true" "true" "8.1.1"