1

I have the following fasta file:

'>gi|277456704|dbj|ID_P|Gene name LLL
MDGFAGSLDDSISAASTSDVQDRLSALESRVQQQEDEITVLKAALADVLRRLAISEDHVASVKKSVSSKV
YRRKHQELQAMQMELQSPEYKLSKLRTSTIMTDYNPNYCFAGKTSSISDLKEVPRKNITLIRGLGHGAFG
EVYEGQVSGMPNDPSPLQVAVKTLPEVCSEQDELDFLMEALIISKFNHQNIVRCIGVSLQSLPRFILLEL
MAGGDLKSFLRETRPRPSQPSSLAMLDLLHVARDIACGCQYLEENHFIHRDIAARNCLLTCPGPGRVAKI
GDFGMARDIYRASYYRKGGCAMLPVKWMPPEAFMEGIFTSKTDTWSFGVLLWEIFSLGYMPYPSKSNQEV
LEFVTSGGRMDPPKNCPGPVYRIMTQCWQHQPEDRPNFAIILERIEYCTQDPDVINTALPIEYGPLVEEE

'>gi|27704|dbj|ID_Y|Gene name JJJ
MDGFAGSLDDSISAASTSDVQDRLSALESRVQQQEDEITVLKAALADVLRRLAISEDHVASVKKSVSSKG
SELRGGYGDPGRLPVGSGLCSASRARLPGHVAADHPPAVYRRKHQELQAMQMELQSPEYKLSKLRTSTIM
TDYNPNYCFAGKTSSISDLKEVPRKNITLIRGLGHGAFGEVYEGQVSGMPNDPSPLQVAVKTLPEVCSEQ
DELDFLMEALIISKFNHQNIVRCIGVSLQSLPRFILLELMAGGDLKSFLRETRPRPSQPSSLAMLDLLHV
ARDIACGCQYLEENHFIHRDIAARNCLLTCPGPGRVAKIGDFGMARDIYRASYYRKGGCAMLPVKWMPPE

'>gi|2097704|dbj|ID_X|Gene name X
MDGFAGSLDDSISAASTSDVQDRLSALESRVQQQEDEITVLKAALADVLRRLAISEDHVASVKKSVSSKG
QPSPRAVIPMSCITNGSGANRKPSHTSAVSIAGKETLSSAAKSGTEKKKEKPQGQREKKEESHSNDQSPQ
IRASPSPQPSSQPLQIHRQTPESKNATPTKSIKRPSPAEKSHNSWENSDDSRNKLSKIPSTPKLIPKVTK
TADKHKDVIINQEGEYIKMFMRGRPITMFIPSDVDNYDDIRTELPPEKLKLEWAYGYRGKDCRANVYLLP
TGEIVYFIASVVVLFNYEERTQRHYLGHTDCVKCLAIHPDKIRIATGQIAGVDKDGRPLQPHVRVWDSVT
LSTLQIIGLGTFERGVGCLDFSKADSGVHLCVIDDSNEHMLTVWDWQRKAKGAEIKTTNEVVLAVEFHPT

I would like to loop through the FASTA , split the protein sequence at all the 'R' it comes across, this will generate peptides and then blastp the peptides. Get the results from blastp and store the blastp results in a separate file for each protein ID in the fasta file. I am not particular about what language is used. I want to learn how this can be done so that i can build more functionality on top of it. Thanks!

RnD
  • 1,172
  • 4
  • 15
  • 25
  • you could also ask http://www.biostars.org – Pierre Jun 08 '13 at 15:23
  • 2
    @Pierre : Thanks for recommending biostars but i donot feel comfortable posting on that site since the moderators are too rude and either downvote the question or close it. – RnD Jun 08 '13 at 18:35
  • your comment was discussed here: http://www.biostars.org/p/73956/ – Pierre Jun 10 '13 at 16:35
  • @Pierre Thats great! Thank you for that. Just look at the simple answer that was given to me below which directed me towards finding the solution. Not exactly the type of response where the first question is what have you done! – RnD Jun 11 '13 at 01:11

1 Answers1

6

With Biopython, you can parse the FASTA file into Sequence objects, split at "R", then BLAST over the internet or run BLAST locally. You can take the results (expressed as SeqRecords, and output them to a FASTA file by iterating over each record.

The documentation has plenty of code samples you can use to piece together what you're looking for.

David Cain
  • 16,484
  • 14
  • 65
  • 75