0

HAVE is a data frame with this structure:

name workplace pr_happy
a     A            0.93
b     B            0.54
c     A            0.72
d     C            0.17
e     D            0.44

I WANT to build an adjacency matrix of name and workplace (exactly like this question: converting data frame into affiliation network in R), but instead of a matrix with binary values, I want the values of pr_happy to populate the cells for each affiliation. WANT should look like this:

       A    B    C    D 
a   0.93 0.00 0.00 0.00
b   0.00 0.54 0.00 0.00
c   0.72 0.00 0.00 0.00
d   0.00 0.00 0.17 0.00
e   0.00 0.00 0.00 0.44

I'm having a hard time wrapping my head around a way to do this simply. Any thoughts?

camille
  • 16,432
  • 18
  • 38
  • 60
J.Q
  • 971
  • 1
  • 14
  • 29

2 Answers2

4

This is essentially pivoting and replacing NA values

Using tidyverse:

library(tidyverse)

dat %>% 
  spread(workplace, pr_happy, fill = 0) %>% # thank you @Jordo82
  tibble::column_to_rownames("name")

     A    B    C    D
a 0.93 0.00 0.00 0.00
b 0.00 0.54 0.00 0.00
c 0.72 0.00 0.00 0.00
d 0.00 0.00 0.17 0.00
e 0.00 0.00 0.00 0.44

data

dat <- structure(list(name = c("a", "b", "c", "d", "e"),
                      workplace = c("A", "B", "A", "C", "D"),
                      pr_happy = c(0.93, 0.54, 0.72, 0.17, 0.44)),
                 .Names = c("name", "workplace", "pr_happy"),
                 row.names = c(NA, -5L), class = c("data.frame"))
zack
  • 5,205
  • 1
  • 19
  • 25
  • 2
    You call also use the `fill` argument in spread to replace NA's with 0: `spread(workplace, pr_happy, fill = 0)` – Jordo82 Dec 07 '18 at 16:18
2

You can do it like that :

WANT=matrix(data = 0,nrow = 5,ncol = 4)
rownames(WANT)=letters[1:5]
colnames(WANT)=LETTERS[1:4]

for ( i in 1:5){
   WANT[HAVE[i,1],HAVE[i,2]]=HAVE[i,3]
}

(although I am sure there is a way without the loop)

David Cros
  • 46
  • 5
  • 1
    Making some assumptions (`name` and `workplace` columns are factors, the order of the levels of the factors is the same as the row and column orders in `WANT`), then we can use matrix indexing to avoid the `for` loop. Call the input data frame `df`, then `HAVE = sapply(df, as.numeric)` and `WANT[HAVE[, 1:2]] = HAVE[, 3]` does the assignment without a loop. – Gregor Thomas Dec 07 '18 at 16:28
  • 1
    These assumptions are pretty easy to enforce, just make your `rownames(WANT) = levels(df$name); colnames(WANT) = levels(df$workplace)`. – Gregor Thomas Dec 07 '18 at 16:29