0

I have a blast result in table format. Below are the first three columns. The first column is the query ID (in this example we have 2 queries; 6031753 and 60317532), and the second column is the hits against the query sequence and have 3 parts

a)  swiss prot id  sp|Q10CQ1| 
b)  gene name      MAD14
c)  organism       ORYSJ

I would like to make the bar chart of genes which are present and how many times they appear against each query.

For example for the first query (60317531)

MAD14   2 times
MAD15   1 time      
AGL8    2 time
AP1     3 time

Fields: query_id subject_id %_identity

gi|60317531|gb|AAX18712.1| sp|Q10CQ1|MAD14_ORYSJ 84.21
gi|60317531|gb|AAX18712.1| sp|P0C5B1|MAD14_ORYSI 83.40
gi|60317531|gb|AAX18712.1| sp|Q6Q9I2|MAD15_ORYSJ 68.91
gi|60317531|gb|AAX18712.1| sp|Q42429|AGL8_SOLTU 57.20
gi|60317531|gb|AAX18712.1| sp|O22328|AGL8_SOLCO 58.00
gi|60317531|gb|AAX18712.1| sp|Q41276|AP1_SINAL 65.79
gi|60317531|gb|AAX18712.1| sp|D7KWY6|AP1_ARALL 65.79
gi|60317531|gb|AAX18712.1| sp|Q8GTF4|AP1C_BRAOB 64.21
gi|60317532|gb|AAX18713.1| sp|B4YPV4|AP1C_BRAOA 64.21
gi|60317532|gb|AAX18713.1| sp|Q96355|1AP1_BRAOT 64.21
gi|60317532|gb|AAX18713.1| sp|P0DI14|AP1_BRARP   

In the bar chart the x axis should be genes, the y axis should be the frequency, and the query ID would be the title of the graph.

Is there any automatic way I can do this? I have ~40,000 queries and around ~100 hits against each query in a single file.

Andy Clifton
  • 4,926
  • 3
  • 35
  • 47
TCFP HCDG
  • 35
  • 9

1 Answers1

0

Step1: Extract the 2nd Col form your output file using:

awk '{print$2}'

Step2: Then open the file in vim editor and type the following command:

:%s!*..*_!!g

Step3: Use this file to plot R.

data <- read.table("ur_file_name.txt", header=F, sep=" ")
barplot(data$V2, xlab="Genes", ylab="Frequency", main="Query ID")
kashiff007
  • 376
  • 2
  • 12