0

I have 1000's of protein sequences in FASTA and their accession numbers. I want to go back into the whole genome shotgun database and retrieve all DNA sequences that encode for a protein identical to one in my list of initial sequences.

I've tried running a tBlastn with <10 results for each sequence, 1 per query and e-value below 1e-100 or with an e-value of zero and I'm not getting any results. I would like to automate this entire process.

Is this something that can be done by running blast from the command line and a batch script?

Kobi
  • 135,331
  • 41
  • 252
  • 292
Andrew
  • 33
  • 3
  • Yes this can be done. If you want to know how, you need to be more specific on what problem you are having. – Vince Feb 03 '15 at 02:35

2 Answers2

0

You should get at least one result: the one that encodes for the original protein. The others, if any, would be pseudogenes, if I follow you.

Anyway, a bit of programming may help help, check out Biopython. Bioperl or Bioruby should have similar features. In particular you can BLAST using Biopython

Hugues Fontenelle
  • 5,275
  • 2
  • 29
  • 44
0

You might find this link useful:

https://www.biostars.org/p/5403/

A similar question has been asked there, and some reasonable solutions have been posted.

Fatt
  • 39
  • 4