how can I extract fasta from gff file based genome fasta, then merge fasta under one transcript to output

Question

Thanks for your help. I want to extract the specific intron fasta, then merge the intron fasta with CDS fasta to output my specific transcript.how can i do this with biopython or python?

my gff file.example:

1   ensembl intron  7904    9192    .   -   .   Parent=GRMZM2G059865_T01;Name=intron.71462
1   ensembl intron  6518    6638    .   -   .   Parent=GRMZM2G059865_T01;Name=intron.71465
1   ensembl intron  6266    6361    .   -   .   Parent=GRMZM2G059865_T01;Name=intron.71466
1   ensembl intron  5976    6107    .   -   .   Parent=GRMZM2G059865_T01;Name=intron.71467
1   ensembl intron  5189    5341    .   -   .   Parent=GRMZM2G059865_T01;Name=intron.71469
1   ensembl CDS 9193    9519    .   -   .   Parent=GRMZM2G059865_T01;Name=CDS.71479
1   ensembl CDS 7594    7903    .   -   0   Parent=GRMZM2G059865_T01;Name=CDS.71480
1   ensembl CDS 6918    7120    .   -   1   Parent=GRMZM2G059865_T01;Name=CDS.71481
1   ensembl CDS 6639    6797    .   -   0   Parent=GRMZM2G059865_T01;Name=CDS.71482
1   ensembl CDS 6362    6517    .   -   0   Parent=GRMZM2G059865_T01;Name=CDS.71483
1   ensembl CDS 6108    6265    .   -   0   Parent=GRMZM2G059865_T01;Name=CDS.71484
1   ensembl CDS 5857    5975    .   -   2   Parent=GRMZM2G059865_T01;Name=CDS.71485
1   ensembl CDS 5342    5407    .   -   1   Parent=GRMZM2G059865_T01;Name=CDS.71486
1   ensembl CDS 5127    5188    .   -   1   Parent=GRMZM2G059865_T01;Name=CDS.71487
1   ensembl intron  39443409    39443716    .   +   .   Parent=GRMZM2G441511_T01;Name=intron.100057
1   ensembl intron  39445109    39445314    .   +   .   Parent=GRMZM2G441511_T01;Name=intron.100061
1   ensembl intron  39450586    39450706    .   +   .   Parent=GRMZM2G441511_T01;Name=intron.100066
1   ensembl CDS 39443355    39443408    .   +   0   Parent=GRMZM2G441511_T01;Name=CDS.100082    
1   ensembl CDS 39443717    39443785    .   +   0   Parent=GRMZM2G441511_T01;Name=CDS.100083
1   ensembl CDS 39444013    39444161    .   +   0   Parent=GRMZM2G441511_T01;Name=CDS.100084
1   ensembl CDS 39444634    39444721    .   +   2   Parent=GRMZM2G441511_T01;Name=CDS.100085
1   ensembl CDS 39445026    39445108    .   +   0   Parent=GRMZM2G441511_T01;Name=CDS.100086
1   ensembl CDS 39445315    39445486    .   +   2   Parent=GRMZM2G441511_T01;Name=CDS.100087
1   ensembl CDS 39447442    39447548    .   +   0   Parent=GRMZM2G441511_T01;Name=CDS.100088
1   ensembl CDS 39449775    39449850    .   +   2   Parent=GRMZM2G441511_T01;Name=CDS.100089
1   ensembl CDS 39449938    39450049    .   +   0   Parent=GRMZM2G441511_T01;Name=CDS.100090
1   ensembl CDS 39450433    39450585    .   +   1   Parent=GRMZM2G441511_T01;Name=CDS.100091
1   ensembl CDS 39450707    39450822    .   +   1   Parent=GRMZM2G441511_T01;Name=CDS.100092
1   ensembl CDS 39450992    39451159    .   +   0   Parent=GRMZM2G441511_T01;Name=CDS.100093
1   ensembl CDS 39451204    39451266    .   +   0   Parent=GRMZM2G441511_T01;Name=CDS.100094
........

score 0 · Answer 1 · answered Oct 30 '14 at 10:30

This is too vague, and so is the answer. You can use a simple Seq Object from Biopython, loading the initial or source (full gene?) sequence:

from Bio.Seq import Seq
from Bio.Alphabet import IUPAC

seq = Seq("ATCAGCATCAGCATCGACTAGCATCGCATCAGC", IUPAC.unambiguous_dna)
# Select this ^^^^^^^^          ^^    

print seq[3:10] + seq[20:23]
# AGCATCAGCA

how can I extract fasta from gff file based genome fasta, then merge fasta under one transcript to output

1 Answers1