2

I have a data frame that has percentage values for a number of variables and observations, as follows:

obs <- data.frame(Site = c("A", "B", "C"), X = c(11, 22, 33), Y = c(44, 55, 66), Z = c(77, 88, 99))

I need to prepare this data as an edge list for network analysis, with "Site" as the nodes and the remaining variables as the edges. The result should look like this:

Node1    Node2    Weight  Type
A         B         33     X
A         C         44     X
...
B         C         187    Z       

So that for "Weight" we are calculating the sum of all possible pairs, and this separately for each column (which ends up in "Type").

I suppose the answer to this has to be using apply on a combn expression, like here Applying combn() function to data frame, but I haven't quite been able to work it out.

I can do this all by hand taking the combinations for "Site"

sites <- combn(obs$Site, 2)

Then the individual columns like so

combA <- combn(obs$A, 2, function(x) sum(x)

and binding those datasets together, but this obviously become annoying very soon.

I have tried to do all the variable columns in one go like this

b <- apply(newdf[, -1], 1, function(x){
sum(utils::combn(x, 2))
}
)

but there is something wrong with that. Can anyone help, please?

NickG
  • 33
  • 5
  • Thanks to ThomasIsCoding, StephenK, and Yuriy Saraykin for three different ways to make this work. I can't really choose an accepted answer, since they all do the job in different ways. – NickG Sep 07 '20 at 09:52

3 Answers3

2

One option would be to create a function and then map that function to all the columns that you have.

func1 <- function(var){
  obs %>% 
    transmute(Node1 = combn(Site, 2)[1, ],
           Node2 = combn(Site, 2)[2, ],
           Weight = combn(!!sym(var), 2, function(x) sum(x)),
           Type = var)
}

map(colnames(obs)[-1], func1) %>% bind_rows()
StephenK
  • 685
  • 5
  • 16
2

Here is an example using combn

do.call(
  rbind,
  combn(1:nrow(obs),
    2,
    FUN = function(k) cbind(data.frame(t(obs[k, 1])), stack(data.frame(as.list(colSums(obs[k, -1]))))),
    simplify = FALSE
  )
)

which gives

  X1 X2 values ind
1  A  B     33   X
2  A  B     99   Y
3  A  B    165   Z
4  A  C     44   X
5  A  C    110   Y
6  A  C    176   Z
7  B  C     55   X
8  B  C    121   Y
9  B  C    187   Z
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • Thank you very much! When I try to run your code, I get the following error: ` Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 2, 3 ` – NickG Sep 07 '20 at 09:13
  • @NickG I think you have your `site` column as `factor` type. To address, you can use `obs <- type.convert(obs, as.is = TRUE)` first – ThomasIsCoding Sep 07 '20 at 09:26
1

try it this way

library(tidyverse)
obs_long <- obs %>% pivot_longer(-Site, names_to = "type")
sites <- combn(obs$Site, 2) %>% t() %>% as_tibble()
Type <- tibble(type = c("X", "Y", "Z"))

merge(sites, Type) %>% 
  left_join(obs_long, by = c("V1" = "Site", "type" = "type")) %>% 
  left_join(obs_long, by = c("V2" = "Site", "type" = "type")) %>% 
  mutate(res = value.x + value.y) %>% 
  select(-c(value.x, value.y))


  V1 V2 type res
1  A  B    X  33
2  A  C    X  44
3  B  C    X  55
4  A  B    Y  99
5  A  C    Y 110
6  B  C    Y 121
7  A  B    Z 165
8  A  C    Z 176
9  B  C    Z 187
Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14
  • Thank you very much for your help! All of this worked, except that `as tibble()` after combn was giving me an error on inconsistent lengths (rows of length 6, values of length 3). So I found this : [https://stackoverflow.com/questions/49784858/tidyverse-friendly-hack-on-combn] and managed to get it done without using combn: `sites <- obs %>% expand(Node1 = Site, Node2 = Site) %>% filter(Node1 != Node2)%>% rowwise() %>% mutate(id = paste0(sort(c(Node1, Node2)), collapse = " ")) %>% distinct(id, .keep_all = TRUE) %>% select(-id)` – NickG Sep 07 '20 at 09:12