0

I need to use a fasta string instead of fasta file to parse it in Seq.IO in python3

from Bio import SeqIO

fasta_string = '>name\nACCTGTGGCTGCTTGCTTGCTTGGGCT'

rec = SeqIO.parse(fasta_string, "fasta")

print (rec[0])

I read that I need to create a file object using io module like

import io

with io.StringIO() as f:
    f.write('abcdef')
    print('gh', file=f)
    f.seek(0)
    print(f.read())

But Seq.IO needs a path to file not only the object file but I can't write a temporary file.

Any ideas?

Thank you in advance.

2 Answers2

2

Ok think I got it from here:

Biopython parse from variable instead of file on SO

from io import StringIO
from Bio import SeqIO

# fasta_string = '>name\nACCTGTGGCTGCTTGCTTGCTTGGGCT'

fasta_string = '>name\nACCTGTGGCTGCTTGCTTGCTTGGGCT\n>name2\nCCCCCCCCCGGGGGGGGGGGTTTTTTTAAA'

fasta_io = StringIO(fasta_string) 

records = SeqIO.parse(fasta_io, "fasta") 

for rec in records:
    print(rec)

fasta_io.close() 

output:

ID: name
Name: name
Description: name
Number of features: 0
Seq('ACCTGTGGCTGCTTGCTTGCTTGGGCT')
ID: name2
Name: name2
Description: name2
Number of features: 0
Seq('CCCCCCCCCGGGGGGGGGGGTTTTTTTAAA')
pippo1980
  • 2,181
  • 3
  • 14
  • 30
0

I quite don't understand your question,

in any case:

from Bio import SeqIO

fasta_string = '>name\nACCTGTGGCTGCTTGCTTGCTTGGGCT'

rec = SeqIO.parse(fasta_string, "fasta")

print (rec[0])

gives ERROR :

FileNotFoundError: [Errno 2] No such file or directory: '>name\nACCTGTGGCTGCTTGCTTGCTTGGGCT'

so to write your sequence out to a fasta file using BioPython

I can use:

from Bio import SeqIO

from Bio.Seq import Seq

from Bio.SeqRecord import SeqRecord
fasta_string = '>name\nACCTGTGGCTGCTTGCTTGCTTGGGCT'

fasta_string_name , fasta_string_seq = fasta_string.strip('>').split('\n')

print(fasta_string_name , fasta_string_seq,'\n\n') # this one just to check the values


# SeqRecord object see https://biopython.org/wiki/SeqRecord
rec = SeqRecord(
    Seq(fasta_string_seq),
    id='',
    name='',
    description=fasta_string_name)


print('SeqRecord object : ',rec, sep='\n') # this one just to check the SeqRecord object 

recordz =[rec]  #this one not necessary but if you have more rec
                #I believe you need them into a list for SeqIO to write them out check it


SeqIO.write(recordz, "example.fasta", "fasta") #this one write out your string as fasta file

example.fasta file written to script path is:

> name
ACCTGTGGCTGCTTGCTTGCTTGGGCT

Again not sure if that is what you were trying to accomplish, just let me know.

pippo1980
  • 2,181
  • 3
  • 14
  • 30
  • Thank you for your response. The main idea is to pass fasta file (with many records) as a huge string to Seq.IO parse. I understand that I can parse this fasta string by myself, however I would like to know if any options to use Seq.IO parse without saving and opening fasta as a file. I am trying to parse fasta files in the stream without writing and reading to files. – Konstantin Kuleshov Apr 03 '21 at 16:53
  • ok see here : https://stackoverflow.com/questions/38358191/biopython-parse-from-variable-instead-of-file Biopython parse from variable instead of file see answer number 2. Had to flag your question as duplicate to keep SO in shape – pippo1980 Apr 03 '21 at 17:39
  • here: fastq_io = StringIO(rec) records = SeqIO.parse(fastq_io, "fastq") fastq_io.close() – pippo1980 Apr 03 '21 at 17:40