how to make biopython read sequences from a fasta file and assign them as input to RNAfold for structure prediction?

Question

I need a program to read each sequence from a fasta file (about 1000sequences) and use each as input to another application (RNAfold) for secondary structure prediciton. I am using python. is it possible? could someone give me a guideline code for a start?

@Lennart i have modified the code to the one below:

$

      from Bio import SeqIO
      import subprocess, re
      PIPE = -1


    handle = open ("D:\python\hsa.fa", "rU")

for record in SeqIO.parse(handle, "fasta"):

output = subprocess.Popen("D:\python\RNAfold.exe -p -d2 --noLP -P", shell=True, stdin = PIPE, stdout = PIPE)

 print output
handle.close()

And am getting the following output that got nothing to do with RNAfold output:what is wrong with my code?

<subprocess.Popen object at 0x02CCEE90>
<subprocess.Popen object at 0x02CCEF30>
<subprocess.Popen object at 0x02CCEF70>
<subprocess.Popen object at 0x02CCEFB0>
<subprocess.Popen object at 0x02CCEB90>
<subprocess.Popen object at 0x02CCEE30>
<subprocess.Popen object at 0x02CCECD0>
<subprocess.Popen object at 0x02CCED90>
<subprocess.Popen object at 0x02CCEEB0>

This is now a completely different question. This is not a discussion forum, it's a Q&A site. You ask a question, and you get it answered. You should really post this completely new question as another question. As it is now, I have now to add a second answer, and the first answer no longer makes any sense, as it answers a question that doesn't exist anymore. — Lennart Regebro, Apr 24 '12 at 17:04
i am sorry @Lennart.. i didnt know how to post program codes in the "comment field" that s the reason why i edited the content of my first post but the question still remains the same as i have not got an anwser to it yet. Thank you for your time! — Armand, Apr 25 '12 at 10:00
No. You *have* gotten an answer to the first question. As a result you rewrote the code, and now you have a *new* question. — Lennart Regebro, Apr 25 '12 at 15:55

score 2 · Answer 1 · answered Apr 23 '12 at 19:30

2

Firstly:

for record in SeqIO.parse(handle, "fasta"):
         seq = record.seq

Will set seq to what seq is on the last record. That doesn't sound useful. You probably want to do something else.

Secondly, since RNAfold.exe apparently reads data from stdin, the easiest way for you to do this is to generate a file with the data you want to feed it. There is little point in calling it from Python, write the data to a file instead.

All in all, your program exhibits a typical beginners error: You are trying to write the whole program in one go, and then when it doesn't work, you try to analyze the program to find the problem.

Instead you need to write the program bit by bit, and all the time verify that each bit actually does what you think it does.

answered Apr 23 '12 at 19:30

Lennart Regebro

167,292
41
224
251

thanks for your reply @Lennart..yes i am a beginner.... but what i am trying to do with those 2 lines of code you mentioned above is to declare the sequence "seq" to use in this . also, i have a fasta file of all the 1000 sequences already called hsa.fa,,,so what other file do you suggest i should generate? – Armand Apr 23 '12 at 21:09
@Armand: I suggest you generate the file that should be used as input to RNAfold.exe. – Lennart Regebro Apr 24 '12 at 08:24
the input file is in line 6 of the above code ,,hsa.fa is the file containing the sequences to use as RNAfold input. @Lennart – Armand Apr 24 '12 at 09:28

score 1 · Answer 2 · answered Apr 24 '12 at 17:09

1

The object you call "output" is not output, it is a process object. The output comes from output.stdout.

answered Apr 24 '12 at 17:09

Lennart Regebro

167,292
41
224
251

output.stdout is giving me the same output :( – Armand Apr 25 '12 at 10:11
@Armand: I don't believe that. However, output.stdout is a file-like object. You can't just print it, you need to read() from it. – Lennart Regebro Apr 25 '12 at 15:55

score 0 · Answer 3 · answered May 21 '14 at 15:36

I'm using python to run RNA fold as well. this code works for cofold ... write the input file and close it open it in 'read then open the output file in write , run the exectuable , then close the files and read the output file ...

        in_w = open('temp_RNAcofold_inputs_24544.txt', 'w')
        in_w.write(sequence1.replace(' ','') + '&' + sequence2.replace(' ',''))
        in_w.close()
        in_r=open('temp_RNAcofold_inputs_24544.txt', 'r')
        out_w = open('temp_RNAcofold_ouputs_24544.txt', 'w' )
        p = subprocess.Popen('C:/Users/s1342417/Dropbox/PhD_work/ViennaRNA_Package/RNAcofold.exe' '-a' , stdin = in_r , stdout = out_w  ,shell =True )
        p.wait()
        out_w.close()
        in_r.close()
        out_r = open('temp_RNAcofold_ouputs_24544.txt','r')
        output = ''
        for line in out_r :
            output = output + out_r.readline()
            print ('a'+str(output))
        out_r.close()
        os.remove('temp_RNAcofold_ouputs_24544.txt')
        os.remove('temp_RNAcofold_inputs_24544.txt')

how to make biopython read sequences from a fasta file and assign them as input to RNAfold for structure prediction?

3 Answers3