0

I would like to transform this data:

    Sample  Genotype  Region
    sample1    A      Region1
    sample1    B      Region1
    sample1    A      Region1
    sample2    A      Region1
    sample2    A      Region1
    sample3    A      Region1
    sample4    B      Region1

In that format, tagging with "E" samples with more than one genotype and unifying samples with the same genotype 2 times:

    Sample  Genotype  Region   
    sample1    E      Region1
    sample2    A      Region1
    sample3    A      Region1
    sample4    B      Region1

I have one list with many regions (Region1 - Regionx). It is possible to do in R software? Thanks a lot.

user3091668
  • 2,230
  • 6
  • 25
  • 42
  • Many options: `tapply` or `ddply` from plyr package or `data.table` package – Metrics Dec 11 '13 at 19:09
  • I wanna to tag excluded (E) in "Genotype" column in an unified line to samples with more than one genotype (sample1) and just unify lines to samples with genotype repeated in two lines (sample2) – user3091668 Dec 11 '13 at 19:33
  • 1
    Is it a data.frame, and the single columns are factors or a matrix with strings? Is this analysis done per region? or how are they summarized. For The genotype you can use something like `function(x) ifelse(length(unique(x))==1,x[1],'E')` – Jörg Mäder Dec 11 '13 at 20:48

1 Answers1

0

One straightforward approach is to use aggregate. Assuming your data.frame is called "mydf" (and building on Jorg's comment):

aggregate(Genotype ~ ., mydf, function(x) {
  a = unique(x)
  ifelse(length(a) > 1, "E", a) 
})
#    Sample  Region Genotype
# 1 sample1 Region1        E
# 2 sample2 Region1        A
# 3 sample3 Region1        A
# 4 sample4 Region1        B
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485