Questions tagged [fasta]

FASTA is a software package for sequence alignment of proteins and nucleic acids. FASTA is also the name of the file format used by these programs to represent sequences of peptides or nucleotides. The format is a de facto standard in bioinformatics.

The FASTA format (read as "fast A format") is a text-based format used by the FASTA software for representing nucleic acids and proteins. It represents each nucleotide and amino-acid as a letter. The FASTA format also supports naming of sequences.

The format achieved great popularity, becoming the de facto standard for representing biological sequences.

A bioinformatical record in FASTA format consists of the header (comment) string followed by one or more strings describing the sequence (one letter per nucleotide or amino acid). Header strings begin with >. The sequence that follows is wrapped at a fixed width (often 60, but generally no more than 80).

> Sample nucleotide sequence
AGCACTGAGTAACGTATAAGCAGTCCCCGGACGCGTA
> Nucleotide sequence #2
GCCACGGGAGTTGAAGAACATCGAGAATGCCACTAGTTTTCACCCTTCATAGATATCCTA
GCGCCGTACATGTATACGAGATCTTTGTCACGCAGTATGGAGGATTGTGGCCAGCAATAC
GTCGTGTCCCGCAATGCTTCATTAGATCCCCGTATATCCATCCTGAGTCATTGTCTGTTG
TCCGTTTTGAAGGAGTCTAGCAGCTTGATA

921 questions

votes

0 answers

how do you convert a fasta file into an sql table?

I have a fasta file temp_mart.txt as such: ENSG00000100219|ENST00000005082 MTLLTFRDVAIEFSLEEWKCLDLAQQNLYRDVMLENYRNLFSVGLTVCKPGL And I tried to load it into a table in sql using load data local infile '~/Desktop/temp_mart.txt' into table mart But…

sql database fasta

asked Aug 05 '18 at 09:19

user1996

votes

1 answer

Select sequences in a fasta file with more than 300 aa and "C" occurs at least 4 times

I have a fasta file which contains protein sequences. I'd like to select sequences with more than 300 amino acids and Cysteine (C) amino acid appears more than 4 times. I've used this command to select sequences with more than 300 aa: cat…

linux awk bioinformatics sequences fasta

asked May 07 '18 at 12:43

M. Sobreiro

votes

2 answers

How to use Biopython to translate a series of DNA sequences in a FASTA file and extract the Protein sequences into a separate field?

I am new to Biopython (and coding in general) and am trying to code a way to translate a series of DNA sequences (more than 80) into protein sequences, in a separate FASTA file. I want to also find the sequence in the correct reading frame. Here's…

python parsing bioinformatics biopython fasta

asked Mar 02 '18 at 16:22

macrosage

votes

1 answer

Write SeqIO Dictionary as Fasta File

I originally converted my fasta sequence into a dictionary with a Bio.SeqIO.to_dict statement. I would like to write a subsetted dictionary back to a fasta file. Test is a python dictionary with fasta headers as keys and the sequences as indexes.…

python dictionary biopython fasta

asked Oct 27 '17 at 20:11

Cody Glickman

votes

1 answer

Parse multi-fasta file to extract out sequences

I am trying to write a script in python to parse a large fasta file, I do not want to use biopython since I am learning scripting. The script needs to print the accession number, sequence length, and sequence gc content to the console. I've been…

python fasta

asked Oct 18 '17 at 17:38

k.smith

votes

1 answer

Problems using awk to select group of sequences from fasta file

I would like to subset my fasta file to retrieve sequences that belong to a given population. The following is a sample of my file. >CLocus_12706_Sample_44_Locus_36326_Allele_0 [JoJo_s113.fq; groupI, 125578,…

unix awk bioinformatics fasta

asked Sep 12 '17 at 17:02

Ella Bowles

votes

3 answers

How can I do a transparent gzip uncompress from both stdin and files in perl?

I've written a few scripts for processing FASTA/FASTQ files (e.g. fastx-length.pl), but would like to make them more generic and accept both compressed and uncompressed files as both command line parameters and as standard input (so that the scripts…

fasta fastq compression perl

asked Jun 10 '17 at 23:14

gringer

votes

3 answers

Binning sequence reads by GC content

I would like to "bin" (split into separate files) a multi-fasta nucleotide sequence file (e.g. a Roche-454 run of ~500,000 reads average read length 250bp). I would like the bins based on GC content of each read. The resultant output would be 8…

bioinformatics sequences fasta

asked Dec 15 '10 at 14:34

Chris

votes

4 answers

sort fasta by sequence size

I currently want to sort a hudge fasta file (+10**8 lines and sequences) by sequence size. fasta is a clear defined format in biology use to store sequence (genetic or proteic): >id1 sequence 1 # could be on several line >id2 sequence 2 ... I have…

python-3.x sorting bioinformatics fasta

asked Dec 20 '16 at 09:59

RomainL.

votes

1 answer

Python: How to compare multiple sequences from a fasta file with each other?

I'm quite new to the programming world of python and I am trying to write a script that, given a FASTA file, will compare the sequences with each other and score them(If the position of the nucleotide in sequence A matches with the nucleotide in the…

python for-loop biopython fasta

asked Dec 08 '16 at 16:49

D.Teeki

votes

1 answer

Using Bioperl to alter nucleotides at specific positions in fasta file?

I am trying to adapt a Bioperl script to change nucleotides at specific positions in a fasta file and output a new file with altered sequences. Example of fasta input: >seq1 AAATAAA Example of nucleotide postions to change…

perl fasta bioperl

asked Apr 09 '15 at 14:45

Amy Ellison

votes

1 answer

Find any letter at the end of a line, delete line break without replacing the target

I'm trying to collapse several lines of letters to a single one. Example >8445125 VSSSDEQPRPRRS RNQDRQHPNQNRP VLGRTERDRNRRQ FGQNFLRDRKTIA >8445125 VSSSDEQPRPRRSRNQDRQHPNQNRPVLGRTERDRNRRQFGQNFLRDRKTIA I've tried regex Find [A-Z]\n Replace with…

regex notepad++ bioinformatics fasta

asked Dec 04 '14 at 00:46

Andrew

votes

3 answers

Undefined subroutines &main error in Perl

I am trying to extract a DNA sequence from this FASTA file to a specified length of bases per line, say 40. > sample dna (This is a typical fasta…

perl bioinformatics subroutine fasta

asked Nov 15 '14 at 15:39

zebra

votes

1 answer

Error while writing fasta file using biopython

I used the following code to write the fasta sequence into file. from Bio import SeqIO sequences = "KKPPLLRR" # add code here output_handle = open("example.fasta", "w") SeqIO.write(sequences, output_handle, "fasta") output_handle.close() I got the…

sequence biopython fasta

asked Oct 30 '14 at 07:02

Exchhattu

votes

3 answers

How do I speed up pattern recognition in perl

This is the program as it stands right now, it takes in a .fasta file (a file containing genetic code), creates a hash table with the data and prints it, however, it is quite slow. It splits a string an compares it against all other letters in the…

perl pattern-matching fasta

asked Sep 23 '14 at 15:57

user1709237

Prev 1 2 3

…

61 62 Next