I have a tab delimited file abc.txt
contig score guide
1:100-101 7 AAA
1:100-101 6 BBB
1:100-101 5 CCC
1:100-101 4 DDD
1:100-101 3 EEE
1:100-101 2 FFF
1:100-101 1 GGG
1:100-101 90 HHH
1:100-101 111 III
1:100-101 1111 JJJ
1:200-203 503.5333333 KKK
1:200-203 570.7212121 LLL
1:200-203 637.9090909 MMM
1:200-203 705.0969697 NNN
1:200-203 772.2848485 OOO
1:200-203 839.4727273 PPP
1:200-203 906.6606061 QQQ
1:200-203 973.8484848 RRR
2:300-301 1041.036364 SSS
2:300-301 1108.224242 TTT
2:300-301 1175.412121 UUU
2:300-301 1242.6 VVV
2:300-301 1309.787879 ABC
2:300-301 1376.975758 CGA
2:300-301 1444.163636 ACD
Column 1-Contig has multiple repeat values, column two has scores and column three has guide letters corresponding to column-2 scores. I need to select top 5 scores for the similar values in column one (contig) and print there corresponding column 3 values.
Output should look like this, with first column having the unique column 1-Contig entry and next 10 rows for the top 5 scores and corresponding column-3 guide letters
Score-1 Guide-1 Score-2 Guide-2 Score-3 Guide-3 Score-4 Guide-4 Score-5 Guide-5
1:100-101 1111 JJJ 111 III 90 HHH 7 AAA 6 BBB
1:200-203 973.8484848 RRR 906.6606061 QQQ 839.4727273 PPP 772.2848485 OOO 705.0969697 NNN
2:300-301 1444.163636 ACD 1376.975758 CGA 1309.787879 ABC 1242.6 VVV 1175.412121 UUU
I used "dplyr" and "desctools" packages, however I am running with some error.
library(dplyr)
library(DescTools)
file <- "abc.txt"
x=read.table(file)
b <- Large(x, k=5, unique = FALSE, na.last=NA)
and getting this error
Error in Large(x, k = 5, unique = FALSE, na.last = NA) :
Not compatible with requested type: [type=character; target=double].
I was manged to do this in excel using 'sumproduct, large, iferror and vllokup' formulas, however for large datasets I want to extract file using R.
Any help will be much appreciated