Questions tagged [fasta]

FASTA is a software package for sequence alignment of proteins and nucleic acids. FASTA is also the name of the file format used by these programs to represent sequences of peptides or nucleotides. The format is a de facto standard in bioinformatics.

The FASTA format (read as "fast A format") is a text-based format used by the FASTA software for representing nucleic acids and proteins. It represents each nucleotide and amino-acid as a letter. The FASTA format also supports naming of sequences.

The format achieved great popularity, becoming the de facto standard for representing biological sequences.

A bioinformatical record in FASTA format consists of the header (comment) string followed by one or more strings describing the sequence (one letter per nucleotide or amino acid). Header strings begin with >. The sequence that follows is wrapped at a fixed width (often 60, but generally no more than 80).

> Sample nucleotide sequence
AGCACTGAGTAACGTATAAGCAGTCCCCGGACGCGTA
> Nucleotide sequence #2
GCCACGGGAGTTGAAGAACATCGAGAATGCCACTAGTTTTCACCCTTCATAGATATCCTA
GCGCCGTACATGTATACGAGATCTTTGTCACGCAGTATGGAGGATTGTGGCCAGCAATAC
GTCGTGTCCCGCAATGCTTCATTAGATCCCCGTATATCCATCCTGAGTCATTGTCTGTTG
TCCGTTTTGAAGGAGTCTAGCAGCTTGATA
921 questions
3
votes
0 answers

how do you convert a fasta file into an sql table?

I have a fasta file temp_mart.txt as such: ENSG00000100219|ENST00000005082 MTLLTFRDVAIEFSLEEWKCLDLAQQNLYRDVMLENYRNLFSVGLTVCKPGL And I tried to load it into a table in sql using load data local infile '~/Desktop/temp_mart.txt' into table mart But…
user1996
  • 31
  • 2
3
votes
1 answer

Select sequences in a fasta file with more than 300 aa and "C" occurs at least 4 times

I have a fasta file which contains protein sequences. I'd like to select sequences with more than 300 amino acids and Cysteine (C) amino acid appears more than 4 times. I've used this command to select sequences with more than 300 aa: cat…
3
votes
2 answers

How to use Biopython to translate a series of DNA sequences in a FASTA file and extract the Protein sequences into a separate field?

I am new to Biopython (and coding in general) and am trying to code a way to translate a series of DNA sequences (more than 80) into protein sequences, in a separate FASTA file. I want to also find the sequence in the correct reading frame. Here's…
macrosage
  • 31
  • 1
  • 2
3
votes
1 answer

Write SeqIO Dictionary as Fasta File

I originally converted my fasta sequence into a dictionary with a Bio.SeqIO.to_dict statement. I would like to write a subsetted dictionary back to a fasta file. Test is a python dictionary with fasta headers as keys and the sequences as indexes.…
Cody Glickman
  • 514
  • 1
  • 8
  • 30
3
votes
1 answer

Parse multi-fasta file to extract out sequences

I am trying to write a script in python to parse a large fasta file, I do not want to use biopython since I am learning scripting. The script needs to print the accession number, sequence length, and sequence gc content to the console. I've been…
k.smith
  • 41
  • 2
3
votes
1 answer

Problems using awk to select group of sequences from fasta file

I would like to subset my fasta file to retrieve sequences that belong to a given population. The following is a sample of my file. >CLocus_12706_Sample_44_Locus_36326_Allele_0 [JoJo_s113.fq; groupI, 125578,…
Ella Bowles
  • 101
  • 10
3
votes
3 answers

How can I do a transparent gzip uncompress from both stdin and files in perl?

I've written a few scripts for processing FASTA/FASTQ files (e.g. fastx-length.pl), but would like to make them more generic and accept both compressed and uncompressed files as both command line parameters and as standard input (so that the scripts…
gringer
  • 410
  • 4
  • 13
3
votes
3 answers

Binning sequence reads by GC content

I would like to "bin" (split into separate files) a multi-fasta nucleotide sequence file (e.g. a Roche-454 run of ~500,000 reads average read length 250bp). I would like the bins based on GC content of each read. The resultant output would be 8…
Chris
  • 31
  • 1
  • 2
3
votes
4 answers

sort fasta by sequence size

I currently want to sort a hudge fasta file (+10**8 lines and sequences) by sequence size. fasta is a clear defined format in biology use to store sequence (genetic or proteic): >id1 sequence 1 # could be on several line >id2 sequence 2 ... I have…
RomainL.
  • 997
  • 1
  • 10
  • 24
3
votes
1 answer

Python: How to compare multiple sequences from a fasta file with each other?

I'm quite new to the programming world of python and I am trying to write a script that, given a FASTA file, will compare the sequences with each other and score them(If the position of the nucleotide in sequence A matches with the nucleotide in the…
D.Teeki
  • 55
  • 1
  • 4
3
votes
1 answer

Using Bioperl to alter nucleotides at specific positions in fasta file?

I am trying to adapt a Bioperl script to change nucleotides at specific positions in a fasta file and output a new file with altered sequences. Example of fasta input: >seq1 AAATAAA Example of nucleotide postions to change…
Amy Ellison
  • 89
  • 1
  • 7
3
votes
1 answer

Find any letter at the end of a line, delete line break without replacing the target

I'm trying to collapse several lines of letters to a single one. Example >8445125 VSSSDEQPRPRRS RNQDRQHPNQNRP VLGRTERDRNRRQ FGQNFLRDRKTIA >8445125 VSSSDEQPRPRRSRNQDRQHPNQNRPVLGRTERDRNRRQFGQNFLRDRKTIA I've tried regex Find [A-Z]\n Replace with…
Andrew
  • 33
  • 3
3
votes
3 answers

Undefined subroutines &main error in Perl

I am trying to extract a DNA sequence from this FASTA file to a specified length of bases per line, say 40. > sample dna (This is a typical fasta…
zebra
  • 83
  • 1
  • 1
  • 5
3
votes
1 answer

Error while writing fasta file using biopython

I used the following code to write the fasta sequence into file. from Bio import SeqIO sequences = "KKPPLLRR" # add code here output_handle = open("example.fasta", "w") SeqIO.write(sequences, output_handle, "fasta") output_handle.close() I got the…
Exchhattu
  • 197
  • 3
  • 15
3
votes
3 answers

How do I speed up pattern recognition in perl

This is the program as it stands right now, it takes in a .fasta file (a file containing genetic code), creates a hash table with the data and prints it, however, it is quite slow. It splits a string an compares it against all other letters in the…
user1709237