1

I would like to use mapply on my custom function which relies on two sf object inputs. Here is a link to my GitHub repository with code and data for a reproducible example: https://github.com/msghankinson/sf_mapply.

I am trying to intersect a shapefile of circles (buffers drawn around points) with a shapefile of Census blocks from California (n = 710,000). The intersection requires the slow st_intersection command because I need to calculate what share of the block is overlapped by the circle (blocks with >=50% of overlap are kept, others are dropped). Because of the shapefile sizes, using st_intersection on the entire state is unwieldy. To improve speed, I plan to both run the intersection on an HPC cluster and break the statewide intersections into parallel tasks of intersections at the county level.

Here is my function:

county_func <- function(x, y) { # (x = circles, y = centroids/points)
  y_id <- y %>% # keep points only as an index for later
    st_drop_geometry()
  x_int <- st_intersects(x, blocks) # intersect circles with blocks to build index of overlap
  ints_holder <- data.frame() 
  for(i in 1:nrow(x_int)){ # for each circle
    blocks_int <- subset(blocks, as.numeric(rownames(blocks)) %in% x_int[[i]]) # subset blocks that intersect
    blocks_int$hud_id <- y_id[i, 1] # add point id to these blocks for merge later
    ints_holder <- rbind(ints_holder, blocks_int)
  }
  x_blocks <- st_intersection(x, ints_holder) %>% # intersection between circles and blocks they intersect
                mutate(intersect_area = st_area(.)) %>% # calculate intersect area
                dplyr::select(GEOID10, intersect_area) # drop irrelevant data
  return(x_blocks)
}

As annotated in my R script, the function is successful when broken into individual commands and run on two dataframe inputs (circles and points). However, to run the function on the counties in parallel via the cluster, I need to make the function compatible with the parallel library, meaning functions like mcmapply/mapply. To test compatibility, I formatted the circle shapefile into a list of circles, grouped at the county level. But when I run the function on even one county drawn from the list, I receive the following error:

Error in UseMethod("st_drop_geometry") : no applicable method for 'st_drop_geometry' applied to an object of class "character"

The error persists even when I remove all character variables from both inputs. As I understand, the apply functions force objects into matrix form, which has compatibility issues with sf objects. But I do not know how else to run the function in parallel via the cluster outside of the parallel library and mcmapply family of commands. Any advice at all would be greatly appreciated.

For reference:

R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.9 sf_1.0-7   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.3       rstudioapi_0.13    magrittr_2.0.3     units_0.7-2        tidyselect_1.1.2   R6_2.5.1          
 [7] rlang_1.0.2        fansi_1.0.3        s2_1.0.7           stringr_1.4.0      wk_0.5.0           tools_4.1.0       
[13] grid_4.1.0         KernSmooth_2.23-20 utf8_1.2.2         cli_3.3.0          e1071_1.7-9        DBI_1.1.1         
[19] ellipsis_0.3.2     class_7.3-19       assertthat_0.2.1   tibble_3.1.7       lifecycle_1.0.1    crayon_1.5.1      
[25] purrr_0.3.4        vctrs_0.4.1        glue_1.6.2         stringi_1.7.6      proxy_0.4-26       compiler_4.1.0    
[31] pillar_1.7.0       generics_0.1.2     classInt_0.4-3     pkgconfig_2.0.3   
> sf::sf_extSoftVersion()
          GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
       "3.9.1"        "3.4.0"        "8.1.1"         "true"         "true"        "8.1.1" 
mhankinson
  • 31
  • 6
  • Where’s the data? – IRTFM Jul 06 '22 at 07:00
  • Data is located in GitHub repository (https://github.com/msghankinson/sf_mapply). I was unsure of how create a toy dataset of an sf object, so I uploaded workable samples of the actual shapefiles. – mhankinson Jul 06 '22 at 12:10
  • I see a link to several files but I see no code block that downloads data and calls the function (which is the only code we see). So we would need to make many guesses about what you actually did that created that error. – IRTFM Jul 07 '22 at 01:25
  • The R code within the repository directly loads the files if the entire repository is downloaded as a zip file and the working directory defined based on the where the file is downloaded. But I am open to adding more. What type of code block would be helpful here? I am not familiar with a more unified script to run, but can work on it. – mhankinson Jul 07 '22 at 03:06

0 Answers0