Questions tagged [fasta]

FASTA is a software package for sequence alignment of proteins and nucleic acids. FASTA is also the name of the file format used by these programs to represent sequences of peptides or nucleotides. The format is a de facto standard in bioinformatics.

The FASTA format (read as "fast A format") is a text-based format used by the FASTA software for representing nucleic acids and proteins. It represents each nucleotide and amino-acid as a letter. The FASTA format also supports naming of sequences.

The format achieved great popularity, becoming the de facto standard for representing biological sequences.

A bioinformatical record in FASTA format consists of the header (comment) string followed by one or more strings describing the sequence (one letter per nucleotide or amino acid). Header strings begin with >. The sequence that follows is wrapped at a fixed width (often 60, but generally no more than 80).

> Sample nucleotide sequence
AGCACTGAGTAACGTATAAGCAGTCCCCGGACGCGTA
> Nucleotide sequence #2
GCCACGGGAGTTGAAGAACATCGAGAATGCCACTAGTTTTCACCCTTCATAGATATCCTA
GCGCCGTACATGTATACGAGATCTTTGTCACGCAGTATGGAGGATTGTGGCCAGCAATAC
GTCGTGTCCCGCAATGCTTCATTAGATCCCCGTATATCCATCCTGAGTCATTGTCTGTTG
TCCGTTTTGAAGGAGTCTAGCAGCTTGATA

921 questions

votes

3 answers

Reading in file block by block using specified delimiter in python

I have an input_file.fa file like this (FASTA format): > header1 description data data data >header2 description more data data data I want to read in the file one chunk at a time, so that each chunk contains one header and the corresponding data,…

asked Jul 29 '16 at 09:25

Chris_Rands

38,994
14
83
119

votes

2 answers

Biopython parse from variable instead of file

import gzip import io from Bio import SeqIO infile = "myinfile.fastq.gz" fileout = open("myoutfile.fastq", "w+") with io.TextIOWrapper(gzip.open(infile, "r")) as f: line = f.read() fileout.write(line) fileout.seek(0) count = 0 for rec in…

python biopython fasta

asked Jul 13 '16 at 17:28

Stuber

votes

3 answers

Convert FASTA to GenBank

Is there a way to use BioPython to convert FASTA files to a Genbank format? There are many answers on how to convert from Genbank to FASTA, but not the other way around.

biopython fasta genbank

asked May 12 '15 at 03:59

Ricky Su

votes

4 answers

how to read a fasta file in python?

I'm trying to read a FASTA file and then find specific motif(string) and print out the sequence and number of times it occurs. A FASTA file is just series of sequences(strings) that starts with a header line and the signature for header or start of…

python fasta

asked Dec 14 '13 at 07:07

user3098683

votes

3 answers

Using realloc to expand buffer while reading from file crashes

I am writing some code that needs to read fasta files, so part of my code (included below) is a fasta parser. As a single sequence can span multiple lines in the fasta format, I need to concatenate multiple successive lines read from the file into a…

c realloc fasta

asked Jan 23 '12 at 14:26

sirlark

2,187
2
18
28

votes

2 answers

Is it possible to pass a string variable to a BLAST search instead of a file?

I'm writing a python script and want to pass the query sequence information into blastn as a string variable rather than a FASTA format file if possible. I used Biopython's SeqIO to store several transcript names as key and its sequences as the…

bioinformatics biopython fasta blast

asked Nov 03 '16 at 14:43

Young-Chan Park

votes

1 answer

What is the general pattern for creating a dcg for file input?

I always seem to struggle to write DCG's to parse input files. But it seems it should be simple? Are there any tips or tricks to think about this problem? For a concrete example, lets say I want to parse a fasta file.…

prolog swi-prolog dcg fasta

asked Jul 11 '15 at 12:06

user27815

4,767
14
28

votes

5 answers

Printing a sequence from a fasta file

I often need to find a particular sequence in a fasta file and print it. For those who don't know, fasta is a text file format for biological sequences (DNA, proteins, etc.). It's pretty simple, you have a line with the sequence name preceded by a…

bash grep fasta

asked Oct 01 '14 at 15:17

Colin

10,447
11
46
54

votes

1 answer

How do I get gene features in FASTA nucleotide format from NCBI using Perl?

I am able to download a FASTA file manually that looks like: >lcl|CR543861.1_gene_1... ATGCTTTGGACA... >lcl|CR543861.1_gene_2... GTGCGACTAAAA... by clicking "Send to" and selecting "Gene Features", FASTA Nucleotide is the only option (which is fine…

database perl fasta bioperl ncbi

asked Feb 27 '14 at 16:08

user2509933

votes

4 answers

How to find inverted repeated pattern in a FASTA sequence?

Suppose my long sequence looks like: 5’-AGGGTTTCCC**TGACCT**TCACTGC**AGGTCA**TGCA-3 The two italics subsequences (here within the two stars) in this long sequence are together called as inverted repeat pattern. The length and the combination of…

python fasta

asked Jan 12 '13 at 21:27

user1964587

votes

3 answers

Sliding window algorithm to analyze values of fasta segments

I have two segments of a random fasta file 1 Segment1 AAGGTTCC 2 Segment2 CCTTGGAA I have another random data set containing dinucleotides' energy values as AA -1.0 AG -2.0 GG -1.5 GT -1.7 TT -1.2 TC -1.8 CC -1.4 CT -2.5 TG -2.1 GA…

r fasta

asked Apr 11 '22 at 13:11

08BKS09

votes

0 answers

How to match string pattern in R

I'm looking for a good library to extract information of a genbank (gbk) file using R. this is a common structure of a gbk file gene complement(1..1002) /gene="bla" /locus_tag="VV1_RS00005" …

r bioinformatics fasta genbank

asked Nov 24 '21 at 05:50

abraham

votes

3 answers

Remove duplicated sequences in FASTA with Python

I apologize if the question has been asked before, but I have been searching for days and could not find a solution in Python. I have a large fasta file, containing headers and sequences. >cavPor3_rmsk_tRNA-Leu-TTA(m) range=chrM:2643-2717 5'pad=0…

python duplicates biopython fasta

asked Mar 03 '21 at 18:12

Marco Badici

votes

3 answers

Using conditions to match multiple patterns within a line

I have a fasta file like this: myfasta.fasta >1_CDS AAAAATTTCTGGGCCCCGGGGG AAATTATTA >2_CDS TTAAAAATTTCTGGGCCCCGGGAAAAAA >3_CDS TTTGGGAATTAAACCCT >4_CDS TTTGGGAATTAAACCCT >5_rRNA TTAAAAATTTCTGGGCCCCGGGAAAAAA >6_tRNA TTAAAAATTTCTGGGCCCCGGGAAAAAA I…

python bioinformatics fasta

asked Mar 27 '19 at 21:16

MAPK

5,635
4
37
88

votes

3 answers

Make a list in python from a FASTA text file

I have text file like this small…

python bioinformatics biopython fasta

asked May 30 '18 at 12:56

john

Prev 1

…

61 62 Next