I am trying to convert pairs of letters (genotype) like AA, GG, GA to numerical values. So for example I would like AA = 0, GG = 1, AG = 2, CC = 3, TT = 4 etc. A sample of my data looks like this:
S1 S2 S3
AA CC AA
AA GG TT
AA CC GG
AA AG AA
I have been trying to use the mutate function in dplyr package, but I am kinda stuck.
The code that I have been running that gives me an error is:
DF1 <- DF %>% mutate_each(funs(chartr("AA", "0", .)))
Error in chartr("AA", "0", c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L)) : 'old' is longer than 'new'
I tried to then edit the code to:
DF1 <- DF %>% mutate_each(funs(chartr("AA", "00", .)))
Which gave me the results below but it's still not what I want it to do. Can someone please help me out with some ideas how to deal with it?
S1 S2 S3
1 00 CC 00
2 00 GG TT
3 00 CC GG
4 00 0G 00
My desired results is:
S1 S2 S3
1 0 3 0
2 0 1 4
3 0 3 1
4 0 1 0