I need help merging the rows with the same name (i.e start column) in the data (mydf) and concatenating the content in "ALT" column thereby getting rid of all the duplicate rows based on the similar values in the start column. I want to merge the rows and concatenate the contents in "ALT" column separated by a comma and get the result as shown below. Thank you for your help.
> mydf
chr start end REF ALT TYPE refGene
chr10 chr10:176131 176131 C A snp nonsynonymous SNV
chr10 chr10:159149 159149 C G snp:17659149 nonsynonymous SNV
chr10 chr10:159149 159149 C T snp:17659149 nonsynonymous SNV
chr10 chr10:241469 241469 T C snp splicing
> result
chr start end REF ALT TYPE refGene
chr10 chr10:176131 176131 C A snp nonsynonymous SNV
chr10 chr10:159149 159149 C G,T snp:17659149 nonsynonymous SNV
chr10 chr10:241469 241469 T C snp splicing
The DPUT is here:
structure(list(chr = c("chr3", "chr3", "chr3", "chr3"), start = c("chr3:75786036",
"chr3:75786337", "chr3:75786337", "chr3:75788226"), end = c(75786036,
75786337, 75786337, 75788226), REF = c("A", "G", "G", "C"), ALT = c("G",
"A", "T", "A"), TYPE = c("snp:75786036", "snp:75786337", "snp:75786337",
"snp:75788226"), `refGene::location` = c("nonsynonymous SNV",
"nonsynonymous SNV", "nonsynonymous SNV", "nonsynonymous SNV"
), `refGene::type` = c("ZNF717:NM_001290208:exon5:c.T2738C:p.F913S,ZNF717:NM_001128223:exon5:c.T2738C:p.F913S,ZNF717:NM_001290209:exon5:c.T2588C:p.F863S,",
"ZNF717:NM_001290208:exon5:c.C2437T:p.P813S,ZNF717:NM_001128223:exon5:c.C2437T:p.P813S,ZNF717:NM_001290209:exon5:c.C2287T:p.P763S,",
"ZNF717:NM_001290208:exon5:c.C2437A:p.P813T,ZNF717:NM_001128223:exon5:c.C2437A:p.P813T,ZNF717:NM_001290209:exon5:c.C2287A:p.P763T,",
"ZNF717:NM_001290208:exon5:c.G548T:p.C183F,ZNF717:NM_001128223:exon5:c.G548T:p.C183F,ZNF717:NM_001290209:exon5:c.G398T:p.C133F,"
)), .Names = c("chr", "start", "end", "REF", "ALT", "TYPE", "refGene::location",
"refGene::type"), row.names = c("4041", "4051", "4052", "4128"
), class = "data.frame")