Hi Friends please help me !
I have large data set with duplications of the first and third cloumns.
test<-matrix(c("Line_A","Line_A","Line_A","Line_A","Line_A","Line_A","Line_A","Line_A","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_C","Line_C","Line_C","Line_C","Line_C","Line_C","Line_C","Line_C","Line_D","Line_D","Line_D","Line_D","Line_E","Line_E","Line_E","Line_E","F5","F5","F5","F5","F6","F6","F6","F6","F5","F5","F5","F5","F6","F6","F6","F6","F7","F7","F7","F7","F5","F5","F5","F5","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","A/A","T/T","T/T","C/C","A/A","T/T","A/T","C/C","A/A","T/T","T/T","C/C","-/-","T/T","A/T","C/C","A/A","C/T","A/A","G/G","A/A","T/T","T/T","C/C","A/A","C/T","T/A","C/C","A/A","T/T","T/T","C/C","A/A","C/C","A/A","G/G"),nrow=36)
colnames(test)<-c("Line","Year","Marker","data")
I converted into a dataframe
test1<-data.frame(test)
There are duplicates in the dataset. For example row 1 and row 5, row 2 and row 6 etc..which has same data in cloumn 1 (Line) and column 3 (Marker).
Line Year Marker data
Line_A F5 M1 A/A
Line_A F5 M2 T/T
Line_A F5 M3 T/T
Line_A F5 M4 C/C
Line_A F6 M1 A/A
Line_A F6 M2 T/T
Line_A F6 M3 A/T
Line_A F6 M4 C/C
Line_B F5 M1 A/A
Line_B F5 M2 T/T
Line_B F5 M3 T/T
.
.
.
I would like to have a table as shown below to list the lines which are duplicated followed by non-duplicated. I would like to get the % of match between the duplicates between years by comparing the marker data. Such as
Line Year Duplication Matching_Marker %
Line A F5 / F6 Yes 75
Line B F5 / F6 Yes 75
Line B F5 / F7 Yes 50
Line B F6 / F7 Yes 25
Line C F5 / F6 Yes 50
Line D F6 No NA
Line D F6 No NA
Thanks in advance !
The best way of showing result should be
Line Year Duplication MatchCount Mismatchcount Matching_Marker % Het%Year1 Het%Year2 Missing%Year1 Missing%Year2
Line A F5 / F6 Yes 3 1 75 0 25 0 0
Line B F5 / F6 Yes 2 1 75 0 33.3 0 25
Line B F5 / F7 Yes 1 3 50 0 25 0 0
Line B F6 / F7 Yes 0 3 0 25 25 25 0
Line C F5 / F6 Yes 2 2 50 0 50 0 0
Line D F6 No NA NA NA 0 NA 0 NA
Line D F6 No NA NA NA 0 NA 0 NA
second example data set is
test<-matrix(c("Line_A","Line_A","Line_A","Line_A","Line_A","Line_A","Line_A","Line_A","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_C","Line_C","Line_C","Line_C","Line_C","Line_C","Line_C","Line_C","Line_D","Line_D","Line_D","Line_D","Line_E","Line_E","Line_E","Line_E","Line_E","Line_E","Line_E","Line_E","F5","F5","F5","F5","F6","F6","F6","F6","F5","F5","F5","F5","F6","F6","F6","F6","F7","F7","F7","F7","F5","F5","F5","F5","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","A/A","T/T","T/T","C/C","A/A","T/T","A/T","C/C","A/A","T/T","T/T","C/C","-/-","T/T","A/T","C/C","A/A","C/T","A/A","G/G","A/A","T/T","T/T","C/C","A/A","C/T","T/A","C/C","A/A","T/T","T/T","C/C","A/A","C/C","A/A","G/G","A/A","C/C","A/A","G/G"),nrow=40)
colnames(test)<-c("Line","Year","Marker","data")
test1<-data.frame(test)
Third example data set where the one of the year data for a line is missing
test<-matrix(c("Line_A","Line_A","Line_A","Line_A","Line_A","Line_A","Line_A","Line_A","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_B","Line_C","Line_C","Line_C","Line_C","Line_C","Line_C","Line_C","Line_C","Line_D","Line_D","Line_D","Line_D","Line_E","Line_E","Line_E","Line_E","Line_E","Line_E","Line_E","Line_E","F5","F5","F5","F5","F6","F6","F6","F6","F5","F5","F5","F5","F6","F6","F6","F6","F7","F7","F7","F7","F5","F5","F5","F5","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","F6","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","M1","M2","M3","M4","-/-","-/-","-/-","-/-","A/A","T/T","A/T","C/C","A/A","T/T","T/T","C/C","-/-","T/T","A/T","C/C","A/A","C/T","A/A","G/G","A/A","T/T","T/T","C/C","A/A","C/T","T/A","C/C","A/A","T/T","T/T","C/C","A/A","C/C","A/A","G/G","A/A","C/C","A/A","G/G"),nrow=40)