1

I would like to solve the following problem using the dplyr in R. This question has been answered using data.table here: Finding indirect nodes for every edge (in R) but because the remainder of my code uses dplyr I need to adapt it.

I have information on groups of physicians working together in given hospitals. A physician can work in more than one hospital at the same time. I would like to write a code that outputs information of all indirect colleagues of a given physician working in a given hospital. For instance, if I work in a given hospital with another physician who also works in another hospital, I would like to know who are the physicians with whom my colleague works in this other hospital.

Consider a simple example of three hospitals (1, 2, 3) and five physicians (A, B, C, D, E). Physicians A, B and C work together in hospital 1. Physicians A, B and D work together in hospital 2. Physicians B and E work together in hospital 3.

For each physician working in a given hospital I would like information of their indirect colleagues through each of their direct colleagues. For example, physician A has one indirect colleague through physician B in hospital 1: this is physician E in hospital 3. On the other hand, physician B does not have any indirect colleague through physician A in hospital 1. Physician C has two indirect colleagues through physician B in hospital 1: they are physician D in hospital 2 and physician E in hospital 3. And so on..

Below is the object that describes the nertworks of physicians in all hospitals:

edges <- tibble(hosp  = c("1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "3", "3"), 
             from = c("A", "A", "B", "B", "C", "C", "A", "A", "B", "B", "D", "D", "B", "E"), 
             to   = c("C", "B", "C", "A", "B", "A", "D", "B", "A", "D", "A", "B", "E", "B")) %>% arrange(hosp, from, to)

I would like a code that produces the following output:

output <- tibble(hosp     = c("1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3"), 
             from     = c("A", "A", "B", "B", "C", "C", "C", "A", "A", "B", "B", "D", "D", "D", "B", "E", "E", "E", "E"), 
             to       = c("C", "B", "C", "A", "B", "A", "B", "D", "B", "A", "D", "A", "B", "B", "E", "B", "B", "B", "B"),
             hosp_ind = c("" , "3", "" , "" , "2", "2", "3", "" , "3", "" , "" , "1", "1", "3", "" , "1", "1", "2", "2"),
             to_ind   = c("" , "E", "" , "" , "D", "D", "E", "" , "E", "" , "" , "C", "C", "E", "" , "A", "C", "A", "D")) %>% arrange(hosp, from, to)
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
PaulaSpinola
  • 531
  • 2
  • 10
  • In your output, it looks like A is connected to E thru B twice, thru hosp A and B. (Similarly B->A, C->D.) Are you considering a connection distinct if it happens at a different hospital? – Jon Spring May 21 '21 at 23:49

2 Answers2

2

Since it seems like you only want the first layer of indirect connections in the network, it's pretty simple without a graph data structure.

get_indirects <- function(hosp_from) {
    x=hosp_from$from[1]
    hosp=hosp_from$hosp[1]
    directs <- edges %>% 
        filter(from==x) %>% 
        pull(to)
    indirects <- edges %>%
        filter(from %in% directs & !(to %in% append(directs,x))) %>% 
        rename(to = from, hosp_ind = hosp, to_ind = to) %>% 
        select(to, hosp_ind, to_ind) %>% 
        mutate(hosp=hosp,from=x,.before=to)
}

split_edges <- edges %>% 
    group_by(hosp,from) %>% 
    group_split()

indirect_df <- lapply(split_edges, get_indirects) %>% bind_rows()

direct_df <- anti_join(edges, indirect_df[,c("from","to")], by = c("from","to"))

output <- bind_rows(indirect_df,direct_df) %>% 
    replace_na(list(hosp_ind="",to_ind="")) %>% 
    arrange(hosp,from,to)

This gives an output identical to the intended output for the example.

kwes
  • 434
  • 2
  • 7
1

Actually you can translate the data.table into dplyr in the following manner

g <- simplify(graph_from_data_frame(edges, directed = FALSE))
edges %>%
  rowwise() %>%
  do(cbind(., {
    to_ind <- setdiff(
      do.call(
        setdiff,
        Map(names, ego(g, 2, c(.$to, .$from), mindist = 2))
      ), .$from
    )
    if (!length(to_ind)) {
      hosp_ind <- to_ind <- NA_character_
    } else {
      hosp_ind <- lapply(to_ind, function(v) names(neighbors(g, v)))
    }
    data.frame(
      hosp_ind = unlist(hosp_ind),
      to_ind = rep(to_ind, lengths(hosp_ind))
    )
  }))

which gives you

# A tibble: 19 x 5
   hosp  from  to    hosp_ind to_ind
   <chr> <chr> <chr> <chr>    <chr>
 1 1     A     B     3        E
 2 1     A     C     NA       NA
 3 1     B     A     NA       NA
 4 1     B     C     NA       NA
 5 1     C     A     2        D
 6 1     C     B     2        D
 7 1     C     B     3        E
 8 2     A     B     3        E
 9 2     A     D     NA       NA
10 2     B     A     NA       NA
11 2     B     D     NA       NA
12 2     D     A     1        C
13 2     D     B     1        C
14 2     D     B     3        E
15 3     B     E     NA       NA
16 3     E     B     1        A
17 3     E     B     2        A
18 3     E     B     1        C
19 3     E     B     2        D
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • Many thanks @ThomaslsCoding, I am just confused as regard to how to define my network. Ideally I would like to define it as `g <- simplify(graph_from_data_frame(cbind(edges$from,edges$to),directed = FALSE))` as I would then like to compute degrees as the number of other direct colleagues instead of the number of physicians by hospital (or number of hospitals by physician). – PaulaSpinola May 23 '21 at 17:42
  • @PaulaSpinola You can use `simplify(graph_from_data_frame(cbind(edges[-1],edges[1]),directed = FALSE))`, where `hosp` is added to the graph as attribute of edges. – ThomasIsCoding May 23 '21 at 21:26
  • Many thanks @ThomaslsCoding. I would like to work with `g <- simplify(graph_from_data_frame(cbind(edges[-1],edges[1]),directed = FALSE))` as you suggested here. How should I then adapt the code you proposed before so as to give me the same output? – PaulaSpinola May 23 '21 at 22:43
  • @PaulaSpinola Well, that will change a lot to the code, since a completely different graph will be applied for analysis. – ThomasIsCoding May 23 '21 at 22:46
  • @ThomaslsCoding Oh I see. I will give it some thought then. Thanks. – PaulaSpinola May 23 '21 at 22:58