0

I have data in a form of:

id state
1  s_1
1  s_2
1  s_3
2  s_1
2  s_3
3  s_1
3  s_2

And I'd like to have it in connecion data frame:

source target freq
s_1    s_2    2
s_1    s_3    1
s_2    s_3    1

I already know I can calculate the frequencies by using plyr::count(), but how to reflow the data into source and target type?

Tom
  • 339
  • 2
  • 9
  • 1
    Not the solution, but as a start: you are looking for all possible combinations between the state values. And then loop through these combinations and calculate the frequencies. There's a `combn` function in R to get the combinations. – deschen Nov 23 '20 at 13:24
  • 2
    This might help: [Expanding a list to include all possible pairwise combinations within a group](https://stackoverflow.com/questions/47276418/expanding-a-list-to-include-all-possible-pairwise-combinations-within-a-group) – MrFlick Nov 23 '20 at 13:29

2 Answers2

1

I believe you could try this with dplyr. As mentioned in comments, use combn to get pair combinations within each id. Afterwards, grouping by source and target you can summarise and get frequencies of each combination.

library(dplyr)

df %>%
  group_by(id) %>%
  do(as.data.frame(t(combn(.$state, m = 2)))) %>%
  setNames(c("id", "source", "target")) %>%
  group_by(source, target) %>%
  summarise(freq = n())

Output

  source target  freq
  <chr>  <chr>  <int>
1 s_1    s_2        2
2 s_1    s_3        2
3 s_2    s_3        1
Ben
  • 28,684
  • 5
  • 23
  • 45
0

I think @Ben's solution is the clearest we can achieve here but for the sake of industriousness I created my solution based on comments and using for loops:

res <- data.frame(source=NA, target=NA)

for (i in 1:unique(df$id){
  df_grouped <- df[df$id == i,]
  for (j in 1:nrow(df_grouped)){
    source <- df_grouped[j, "state"]
    target <- df_grouped[j+1, "state"]
    res <- rbind(res, cbind(source,target))
  }
}
res <- res[complete.cases(res),]
res <- plyr::count(res)
res
Tom
  • 339
  • 2
  • 9