3

I need help converting a dataframe with certain values into columns that looks like contrasts in R. For example.

code <- data.frame(code = c('R1111', 'R1112', 'R1111', 'R1111', 'R1113', 
                            'R1112', 'R1112', 'R1112', 'R1113', 'R1115')) 

I need to convert this to the following table

    code   R1111  R1112   R1113   R1115
1  R1111     1      0       0       0
2  R1112     0      1       0       0
3  R1111     2      0       0       0 
4  R1111     3      0       0       0 
5  R1113     0      0       1       0 
6  R1112     0      2       0       0 
7  R1112     0      3       0       0 
8  R1112     0      4       0       0 
9  R1113     0      0       2       0 
10 R1115     0      0       0       1 

I have 1400 rows with those sorts of codes that I need to convert. If you notice, each column with the code has increasing number. I tried to do this using reshape2, but i keep getting errors - meaning I haven't been able to figure this out. How can I get this result?

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
user6401955
  • 155
  • 1
  • 12

5 Answers5

2

An option is to use mapply in combination with ifelse to get the desired result as:

cbind(code,mapply(function(x){
  ifelse(code$code==x,cumsum(code$code==x),0)
}, unique(as.character(code$code))))

#     code R1111 R1112 R1113 R1115
# 1  R1111     1     0     0     0
# 2  R1112     0     1     0     0
# 3  R1111     2     0     0     0
# 4  R1111     3     0     0     0
# 5  R1113     0     0     1     0
# 6  R1112     0     2     0     0
# 7  R1112     0     3     0     0
# 8  R1112     0     4     0     0
# 9  R1113     0     0     2     0
# 10 R1115     0     0     0     1
MKR
  • 19,739
  • 4
  • 23
  • 33
1

You can use model.matrix to generate the dummy matrix. Then just multiply it by the number of values.

# calculate indicator using base or data.table, more succinctly
# code$tag = with(code, as.numeric(ave(as.character(code), code, 
#                                  FUN=function(x) cumsum(duplicated(x))+1L)))
code$tag = data.table::rowid(code$code) 

model.matrix(~ 0 + code, data=code)* code$tag
#    codeR1111 codeR1112 codeR1113 codeR1115
# 1          1         0         0         0
# 2          0         1         0         0
# 3          2         0         0         0
# 4          3         0         0         0
# 5          0         0         1         0
# 6          0         2         0         0
# 7          0         3         0         0
# 8          0         4         0         0
# 9          0         0         2         0
# 10         0         0         0         1
user20650
  • 24,654
  • 5
  • 56
  • 91
0

A Base R approach (it will throw up some warnings, you can ignore them):

x <- code$code
y <- rep(0, length(x))

DF <- data.frame(x, y, y, y, y)
DF[,2][DF[,1]==unique(x)[1]] <- 1:length(x)
DF[,3][DF[,1]==unique(x)[2]] <- 1:length(x)
DF[,4][DF[,1]==unique(x)[3]] <- 1:length(x)
DF[,5][DF[,1]==unique(x)[4]] <- 1:length(x)

or wrap it in a loop if you've got a lot of columns to handle:

DF <- data.frame(x, y, y, y, y)
for(i in 1:4){
  DF[,i+1][DF[,1]==unique(x)[i]] <- 1:length(x)
}
rg255
  • 4,119
  • 3
  • 22
  • 40
0

A sapply is capable of doing this: I store code as a vector and do some post-processing to generate the actual data.frame.

code <- c("R1111", "R1112", "R1111", "R1111", "R1113", "R1112", "R1112", 
"R1112", "R1113", "R1115")

val <- sapply(sort(unique(code)), function(thiscode) 
  (code==thiscode)*cumsum(code==thiscode)
)

The output is a matrix

      R1111 R1112 R1113 R1115
 [1,]     1     0     0     0
 [2,]     0     1     0     0
 [3,]     2     0     0     0
 [4,]     3     0     0     0
 [5,]     0     0     1     0
 [6,]     0     2     0     0
 [7,]     0     3     0     0
 [8,]     0     4     0     0
 [9,]     0     0     2     0
[10,]     0     0     0     1

and formatting it thusly gives the desired output.

val <- data.frame(code=code, val)
AdamO
  • 4,283
  • 1
  • 27
  • 39
  • Doesn't achieve op's goal – rg255 Apr 26 '18 at 05:13
  • @griffinevo thanks for letting me know. I'm surprised by that behavior from `outer`, I should look into it more. Here's a `sapply` version that, with more careful inspection, does as desired. – AdamO Apr 26 '18 at 13:34
  • Great, thanks for updating - I withdrew my downvote :) – rg255 Apr 26 '18 at 19:09
0

A fairly simple base solution:

m  <- sapply(unique(code$code),'==',code$code)
m2 <- apply(m,2,cumsum)
m2[!m] <- 0
cbind(code,`colnames<-`(m2,unique(code$code)))

#     code R1111 R1112 R1113 R1115
# 1  R1111     1     0     0     0
# 2  R1112     0     1     0     0
# 3  R1111     2     0     0     0
# 4  R1111     3     0     0     0
# 5  R1113     0     0     1     0
# 6  R1112     0     2     0     0
# 7  R1112     0     3     0     0
# 8  R1112     0     4     0     0
# 9  R1113     0     0     2     0
# 10 R1115     0     0     0     1
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167