Questions tagged [fasta]

FASTA is a software package for sequence alignment of proteins and nucleic acids. FASTA is also the name of the file format used by these programs to represent sequences of peptides or nucleotides. The format is a de facto standard in bioinformatics.

The FASTA format (read as "fast A format") is a text-based format used by the FASTA software for representing nucleic acids and proteins. It represents each nucleotide and amino-acid as a letter. The FASTA format also supports naming of sequences.

The format achieved great popularity, becoming the de facto standard for representing biological sequences.

A bioinformatical record in FASTA format consists of the header (comment) string followed by one or more strings describing the sequence (one letter per nucleotide or amino acid). Header strings begin with >. The sequence that follows is wrapped at a fixed width (often 60, but generally no more than 80).

> Sample nucleotide sequence
AGCACTGAGTAACGTATAAGCAGTCCCCGGACGCGTA
> Nucleotide sequence #2
GCCACGGGAGTTGAAGAACATCGAGAATGCCACTAGTTTTCACCCTTCATAGATATCCTA
GCGCCGTACATGTATACGAGATCTTTGTCACGCAGTATGGAGGATTGTGGCCAGCAATAC
GTCGTGTCCCGCAATGCTTCATTAGATCCCCGTATATCCATCCTGAGTCATTGTCTGTTG
TCCGTTTTGAAGGAGTCTAGCAGCTTGATA
921 questions
6
votes
3 answers

Reading in file block by block using specified delimiter in python

I have an input_file.fa file like this (FASTA format): > header1 description data data data >header2 description more data data data I want to read in the file one chunk at a time, so that each chunk contains one header and the corresponding data,…
Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
6
votes
2 answers

Biopython parse from variable instead of file

import gzip import io from Bio import SeqIO infile = "myinfile.fastq.gz" fileout = open("myoutfile.fastq", "w+") with io.TextIOWrapper(gzip.open(infile, "r")) as f: line = f.read() fileout.write(line) fileout.seek(0) count = 0 for rec in…
Stuber
  • 447
  • 5
  • 16
6
votes
3 answers

Convert FASTA to GenBank

Is there a way to use BioPython to convert FASTA files to a Genbank format? There are many answers on how to convert from Genbank to FASTA, but not the other way around.
Ricky Su
  • 295
  • 1
  • 7
  • 10
6
votes
4 answers

how to read a fasta file in python?

I'm trying to read a FASTA file and then find specific motif(string) and print out the sequence and number of times it occurs. A FASTA file is just series of sequences(strings) that starts with a header line and the signature for header or start of…
user3098683
  • 61
  • 1
  • 1
  • 3
5
votes
3 answers

Using realloc to expand buffer while reading from file crashes

I am writing some code that needs to read fasta files, so part of my code (included below) is a fasta parser. As a single sequence can span multiple lines in the fasta format, I need to concatenate multiple successive lines read from the file into a…
sirlark
  • 2,187
  • 2
  • 18
  • 28
5
votes
2 answers

Is it possible to pass a string variable to a BLAST search instead of a file?

I'm writing a python script and want to pass the query sequence information into blastn as a string variable rather than a FASTA format file if possible. I used Biopython's SeqIO to store several transcript names as key and its sequences as the…
5
votes
1 answer

What is the general pattern for creating a dcg for file input?

I always seem to struggle to write DCG's to parse input files. But it seems it should be simple? Are there any tips or tricks to think about this problem? For a concrete example, lets say I want to parse a fasta file.…
user27815
  • 4,767
  • 14
  • 28
5
votes
5 answers

Printing a sequence from a fasta file

I often need to find a particular sequence in a fasta file and print it. For those who don't know, fasta is a text file format for biological sequences (DNA, proteins, etc.). It's pretty simple, you have a line with the sequence name preceded by a…
Colin
  • 10,447
  • 11
  • 46
  • 54
5
votes
1 answer

How do I get gene features in FASTA nucleotide format from NCBI using Perl?

I am able to download a FASTA file manually that looks like: >lcl|CR543861.1_gene_1... ATGCTTTGGACA... >lcl|CR543861.1_gene_2... GTGCGACTAAAA... by clicking "Send to" and selecting "Gene Features", FASTA Nucleotide is the only option (which is fine…
5
votes
4 answers

How to find inverted repeated pattern in a FASTA sequence?

Suppose my long sequence looks like: 5’-AGGGTTTCCC**TGACCT**TCACTGC**AGGTCA**TGCA-3 The two italics subsequences (here within the two stars) in this long sequence are together called as inverted repeat pattern. The length and the combination of…
user1964587
  • 519
  • 2
  • 9
  • 12
4
votes
3 answers

Sliding window algorithm to analyze values of fasta segments

I have two segments of a random fasta file 1 Segment1 AAGGTTCC 2 Segment2 CCTTGGAA I have another random data set containing dinucleotides' energy values as AA -1.0 AG -2.0 GG -1.5 GT -1.7 TT -1.2 TC -1.8 CC -1.4 CT -2.5 TG -2.1 GA…
08BKS09
  • 105
  • 6
4
votes
0 answers

How to match string pattern in R

I'm looking for a good library to extract information of a genbank (gbk) file using R. this is a common structure of a gbk file gene complement(1..1002) /gene="bla" /locus_tag="VV1_RS00005" …
abraham
  • 661
  • 8
  • 14
4
votes
3 answers

Remove duplicated sequences in FASTA with Python

I apologize if the question has been asked before, but I have been searching for days and could not find a solution in Python. I have a large fasta file, containing headers and sequences. >cavPor3_rmsk_tRNA-Leu-TTA(m) range=chrM:2643-2717 5'pad=0…
4
votes
3 answers

Using conditions to match multiple patterns within a line

I have a fasta file like this: myfasta.fasta >1_CDS AAAAATTTCTGGGCCCCGGGGG AAATTATTA >2_CDS TTAAAAATTTCTGGGCCCCGGGAAAAAA >3_CDS TTTGGGAATTAAACCCT >4_CDS TTTGGGAATTAAACCCT >5_rRNA TTAAAAATTTCTGGGCCCCGGGAAAAAA >6_tRNA TTAAAAATTTCTGGGCCCCGGGAAAAAA I…
MAPK
  • 5,635
  • 4
  • 37
  • 88
4
votes
3 answers

Make a list in python from a FASTA text file

I have text file like this small…
john
  • 263
  • 1
  • 9
1
2
3
61 62