How to create a table like contrasts in r

Question

I need help converting a dataframe with certain values into columns that looks like contrasts in R. For example.

code <- data.frame(code = c('R1111', 'R1112', 'R1111', 'R1111', 'R1113', 
                            'R1112', 'R1112', 'R1112', 'R1113', 'R1115'))

I need to convert this to the following table

    code   R1111  R1112   R1113   R1115
1  R1111     1      0       0       0
2  R1112     0      1       0       0
3  R1111     2      0       0       0 
4  R1111     3      0       0       0 
5  R1113     0      0       1       0 
6  R1112     0      2       0       0 
7  R1112     0      3       0       0 
8  R1112     0      4       0       0 
9  R1113     0      0       2       0 
10 R1115     0      0       0       1

I have 1400 rows with those sorts of codes that I need to convert. If you notice, each column with the code has increasing number. I tried to do this using reshape2, but i keep getting errors - meaning I haven't been able to figure this out. How can I get this result?

Just the `melt` and `cast` .. trying and learning. But nothing concrete. I'm not even sure if my code is correct. — user6401955, Apr 25 '18 at 20:26

MKR · Accepted Answer · 2018-04-25T20:52:44.500

2

An option is to use mapply in combination with ifelse to get the desired result as:

cbind(code,mapply(function(x){
  ifelse(code$code==x,cumsum(code$code==x),0)
}, unique(as.character(code$code))))

#     code R1111 R1112 R1113 R1115
# 1  R1111     1     0     0     0
# 2  R1112     0     1     0     0
# 3  R1111     2     0     0     0
# 4  R1111     3     0     0     0
# 5  R1113     0     0     1     0
# 6  R1112     0     2     0     0
# 7  R1112     0     3     0     0
# 8  R1112     0     4     0     0
# 9  R1113     0     0     2     0
# 10 R1115     0     0     0     1

edited Apr 25 '18 at 20:52

answered Apr 25 '18 at 20:47

MKR

19,739
4
23
33

Why use `mapply` for a single argument function? – AdamO Apr 25 '18 at 21:01
1

@AdamO Its just an idea. You can get the same result using `do.call` etc. `mapply` is not important. – MKR Apr 25 '18 at 21:03

score 1 · Answer 2 · answered Apr 25 '18 at 21:06

You can use model.matrix to generate the dummy matrix. Then just multiply it by the number of values.

# calculate indicator using base or data.table, more succinctly
# code$tag = with(code, as.numeric(ave(as.character(code), code, 
#                                  FUN=function(x) cumsum(duplicated(x))+1L)))
code$tag = data.table::rowid(code$code) 

model.matrix(~ 0 + code, data=code)* code$tag
#    codeR1111 codeR1112 codeR1113 codeR1115
# 1          1         0         0         0
# 2          0         1         0         0
# 3          2         0         0         0
# 4          3         0         0         0
# 5          0         0         1         0
# 6          0         2         0         0
# 7          0         3         0         0
# 8          0         4         0         0
# 9          0         0         2         0
# 10         0         0         0         1

rg255 · Answer 3 · 2018-04-26T07:45:59.530

A Base R approach (it will throw up some warnings, you can ignore them):

x <- code$code
y <- rep(0, length(x))

DF <- data.frame(x, y, y, y, y)
DF[,2][DF[,1]==unique(x)[1]] <- 1:length(x)
DF[,3][DF[,1]==unique(x)[2]] <- 1:length(x)
DF[,4][DF[,1]==unique(x)[3]] <- 1:length(x)
DF[,5][DF[,1]==unique(x)[4]] <- 1:length(x)

or wrap it in a loop if you've got a lot of columns to handle:

DF <- data.frame(x, y, y, y, y)
for(i in 1:4){
  DF[,i+1][DF[,1]==unique(x)[i]] <- 1:length(x)
}

AdamO · Answer 4 · 2018-04-26T14:15:56.880

0

A sapply is capable of doing this: I store code as a vector and do some post-processing to generate the actual data.frame.

code <- c("R1111", "R1112", "R1111", "R1111", "R1113", "R1112", "R1112", 
"R1112", "R1113", "R1115")

val <- sapply(sort(unique(code)), function(thiscode) 
  (code==thiscode)*cumsum(code==thiscode)
)

The output is a matrix

      R1111 R1112 R1113 R1115
 [1,]     1     0     0     0
 [2,]     0     1     0     0
 [3,]     2     0     0     0
 [4,]     3     0     0     0
 [5,]     0     0     1     0
 [6,]     0     2     0     0
 [7,]     0     3     0     0
 [8,]     0     4     0     0
 [9,]     0     0     2     0
[10,]     0     0     0     1

and formatting it thusly gives the desired output.

val <- data.frame(code=code, val)

edited Apr 26 '18 at 14:15

answered Apr 25 '18 at 20:57

AdamO

4,283
1
27
39

Doesn't achieve op's goal – rg255 Apr 26 '18 at 05:13
@griffinevo thanks for letting me know. I'm surprised by that behavior from `outer`, I should look into it more. Here's a `sapply` version that, with more careful inspection, does as desired. – AdamO Apr 26 '18 at 13:34
Great, thanks for updating - I withdrew my downvote :) – rg255 Apr 26 '18 at 19:09

score 0 · Answer 5 · answered Apr 28 '18 at 01:23

A fairly simple base solution:

m  <- sapply(unique(code$code),'==',code$code)
m2 <- apply(m,2,cumsum)
m2[!m] <- 0
cbind(code,`colnames<-`(m2,unique(code$code)))

#     code R1111 R1112 R1113 R1115
# 1  R1111     1     0     0     0
# 2  R1112     0     1     0     0
# 3  R1111     2     0     0     0
# 4  R1111     3     0     0     0
# 5  R1113     0     0     1     0
# 6  R1112     0     2     0     0
# 7  R1112     0     3     0     0
# 8  R1112     0     4     0     0
# 9  R1113     0     0     2     0
# 10 R1115     0     0     0     1

How to create a table like contrasts in r

5 Answers5