0

I am trying to convert pairs of letters (genotype) like AA, GG, GA to numerical values. So for example I would like AA = 0, GG = 1, AG = 2, CC = 3, TT = 4 etc. A sample of my data looks like this:

S1 S2 S3
AA CC AA
AA GG TT
AA CC GG
AA AG AA

I have been trying to use the mutate function in dplyr package, but I am kinda stuck.

The code that I have been running that gives me an error is:

DF1 <- DF %>% mutate_each(funs(chartr("AA", "0", .)))

Error in chartr("AA", "0", c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L)) : 'old' is longer than 'new'

I tried to then edit the code to:

DF1 <- DF %>% mutate_each(funs(chartr("AA", "00", .)))

Which gave me the results below but it's still not what I want it to do. Can someone please help me out with some ideas how to deal with it?

S1 S2 S3
1 00 CC 00
2 00 GG TT
3 00 CC GG
4 00 0G 00

My desired results is:

S1 S2 S3
1 0 3 0
2 0 1 4
3 0 3 1
4 0 1 0
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
Anita
  • 45
  • 2
  • 4
  • 1
    I think there's an error in your desired results since row 4 of `S2` would be `AG` == `2` vs `1` that you have. – hrbrmstr Sep 30 '15 at 21:16

1 Answers1

1
dat <- read.table(text="S1 S2 S3
AA CC AA
AA GG TT
AA CC GG
AA AG AA", header=TRUE, stringsAsFactors=FALSE)

Assuming a finite translation table:

xlate <- c(AA = 0, GG = 1, AG = 2, CC = 3, TT = 4)

dat[] <- lapply(dat, function(x) { xlate[x] })

dat
##   S1 S2 S3
## 1  0  3  0
## 2  0  1  4
## 3  0  3  1
## 4  0  2  0
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205