0

I would like to calculate a distance matrix between the centroids of the polygons in a region considering spatial link distance (aka 'neighbor distances') instead of simple Euclidean distances. Spatial link distance considers the Euclidean distances along the links of spatial neighbor links.

In other words, I would like to calculate spatial distance link matrix. Is there a package / function to do this?

I've done a quick exploration with sfdep and here's the solution I've found in the repex below:

reprex

Step 1: calculate distances between neighboring polygons (dists) using sfdep::st_nb_dists():

library(sf)
library(sfdep)
library(data.table)
library(fields)

# get contiguity and distance btwn neighbors
geo <- sf::st_geometry(guerry)
nb <- sfdep::st_contiguity(geo)
dists <- sfdep::st_nb_dists(geo, nb)

Step 2: I've created a nb_list_to_df() function to convert dists into a data.frame in long format with the distance for every pair or neighboring polygons (od_df)

# fun to convert nb dist to a data.frame in long format
nb_list_to_df <- function(nb, dists) {
  
  mtrx <- sfdep::wt_as_matrix(nb, dists)
  
  matrix_length <- 1:length(mtrx[1,])
  
  # FULL MATRIX
  mtrx_long <- cbind(
    as.data.table(
      data.table::CJ(matrix_length, matrix_length)), # df two columns
    'dist' = as.vector(mtrx)  # matrix values in a vector
  )
  
  # keep only dist between neighbors
  mtrx_long <- subset(mtrx_long, dist >0)
  setnames(mtrx_long, c('from', 'to', 'dist'))
  
  return(mtrx_long)
}

# convert nb dist to a data.frame in long format
od_df <- nb_list_to_df(nb, dists)
head(od_df)
#>    orig dest     dist
#> 1:    1   36 90030.63
#> 2:    1   37 87399.28
#> 3:    1   67 55587.69
#> 4:    1   69 85693.43
#> 5:    2    7 85221.38
#> 6:    2   49 75937.49

The final steps are to:

  1. Use od_df to create a network graph using a library like igraph. In this case, I'm using cppRouting because it seems to be the most efficient.
  2. Use the network topology to calculate the distance between all combinations of origin-destination pairs
# step 3: create a network graph
library(cppRouting)
graph  <-  makegraph(od_df, directed = F)

# step 4: get the distances considering network topology
dist_link <- get_distance_matrix(Graph=graph, 
                    from = unique(od_df$orig), 
                    to = unique(od_df$orig))

This code arrives at the desired solution. However, I'm wondering if this is a good approach or if there are better / more efficient ways to do this.

rafa.pereira
  • 13,251
  • 6
  • 71
  • 109

1 Answers1

1

As I understand you are concerned that expanding the nb object to a matrix would demand too much RAM, which means finding a different solution to the first part of your code. Here's a version without expanding the nb object into a matrix:

library(sf)
library(sfdep)
library(data.table)

# get contiguity and distance between neighbors
geo = sf::st_geometry(guerry)
nb = sfdep::st_contiguity(geo)
dists = sfdep::st_nb_dists(geo, nb)


# If you look at the structure of nb, it's a list the lenght of
# nrow(geo) and each element contains the index of the neighbours
# of each row.
str(nb)
#> List of 85
#>  $ : int [1:4] 36 37 67 69
#>  $ : int [1:6] 7 49 57 58 73 76
#>  $ : int [1:6] 17 21 40 56 61 69
#>  $ : int [1:4] 5 24 79 80
#>  ---
#>  $ : int [1:6] 15 34 35 47 75 83
#>  $ : int [1:6] 15 18 21 22 34 82
#>  $ : int [1:6] 50 52 53 65 66 68
#>  $ : int [1:5] 9 19 43 56 73
#>  - attr(*, "class")= chr [1:2] "nb" "list"
#>  - attr(*, "region.id")= chr [1:85] "1" "2" "3" "4" ...
#>  - attr(*, "call")= language spdep::poly2nb(pl = geometry, queen = queen)
#>  - attr(*, "type")= chr "queen"
#>  - attr(*, "sym")= logi TRUE
# The same goes for dists
str(dists)
#> List of 85
#>  $ : num [1:4] 90031 87399 55588 85693
#>  $ : num [1:6] 85221 75937 125737 86797 102273 ...
#>  $ : num [1:6] 85461 92563 96614 88010 66400 ...
#>  $ : num [1:4] 53294 98250 83762 83974
#> ---
#>  $ : num [1:6] 110004 84644 69201 108549 60336 ...
#>  $ : num [1:6] 89353 81323 69322 96794 102964 ...
#>  $ : num [1:6] 85539 90350 116728 95918 71842 ...
#>  $ : num [1:5] 65601 96398 100014 82309 95857
#>  - attr(*, "class")= chr [1:2] "nbdist" "list"
#>  - attr(*, "call")= language spdep::nbdists(nb = nb, coords = x, longlat = longlat)

# convert nb dist to a data.frame in long format
# You can loop through geo's row and collect destinations and distances as you go
n = length(nb)
res = data.table(from = integer(), to = integer(), dist = numeric())
for(i in seq_len(n)){
  res = rbind(res, data.table(from = i, to = nb[[i]], dist = dists[[i]]))
}
res
#>      from to      dist
#>   1:    1 36  90030.63
#>   2:    1 37  87399.28
#>   3:    1 67  55587.69
#>   4:    1 69  85693.43
#>   5:    2  7  85221.38
#>  ---                  
#> 416:   85  9  65601.32
#> 417:   85 19  96397.60
#> 418:   85 43 100013.94
#> 419:   85 56  82308.90
#> 420:   85 73  95857.49

Created on 2023-06-13 with reprex v2.0.2

This method appends a data.table to the initial null table at each iteration, it's probably not super fast but should be RAM efficient. It can be optimised if you run into speed problems.

Duccio A
  • 1,402
  • 13
  • 27
  • Thanks Duccio. It's a very elegant solution to the initial part of the problem. Your for loop arrives at the same result as the `nb_list_to_df()` function I proposed – rafa.pereira Jun 13 '23 at 14:49
  • @rafa.pereira For the second part, I can try to help but I am not sure what the problem is. You are using some jargon that I don't understand because I am not an expert on those topics and your final question concerns only the first part. Perhaps an example of the final results that you want to achieve and a specific question. Can you achieve what you want with those methods and you just want to optimise or do you need a solution to the problem? – Duccio A Jun 13 '23 at 14:59
  • Thanks for your comment. The code I presented in the questions arrives at the desired output, but now I see I should edit the question to make this clearer – rafa.pereira Jun 13 '23 at 15:43
  • The second part of the question is about performance. The only way to go is to benchmark different implementations of your code (like, as you said, by using `igraph`, or `network` or `tidygraph`). Also, have a look at `cugraph` of rapids.ai, to run network analysis on GPU. They have a Python implementation but I vaguely remember seeing an R implementation, too. – Duccio A Jun 14 '23 at 09:56