Questions tagged [fasta]

FASTA is a software package for sequence alignment of proteins and nucleic acids. FASTA is also the name of the file format used by these programs to represent sequences of peptides or nucleotides. The format is a de facto standard in bioinformatics.

The FASTA format (read as "fast A format") is a text-based format used by the FASTA software for representing nucleic acids and proteins. It represents each nucleotide and amino-acid as a letter. The FASTA format also supports naming of sequences.

The format achieved great popularity, becoming the de facto standard for representing biological sequences.

A bioinformatical record in FASTA format consists of the header (comment) string followed by one or more strings describing the sequence (one letter per nucleotide or amino acid). Header strings begin with >. The sequence that follows is wrapped at a fixed width (often 60, but generally no more than 80).

> Sample nucleotide sequence
AGCACTGAGTAACGTATAAGCAGTCCCCGGACGCGTA
> Nucleotide sequence #2
GCCACGGGAGTTGAAGAACATCGAGAATGCCACTAGTTTTCACCCTTCATAGATATCCTA
GCGCCGTACATGTATACGAGATCTTTGTCACGCAGTATGGAGGATTGTGGCCAGCAATAC
GTCGTGTCCCGCAATGCTTCATTAGATCCCCGTATATCCATCCTGAGTCATTGTCTGTTG
TCCGTTTTGAAGGAGTCTAGCAGCTTGATA
921 questions
-4
votes
2 answers

Remove line breaks in a FASTA file in r

I have a fasta file where the sequences are broken up with newlines. I'd like to remove the newlines. Here's an example of my file: >accession1 ATGGCCCATG GGATCCTAGC >accession2 GATATCCATG AAACGGCTTA I'd like to convert it into…
Quang Ong
  • 1
  • 2
-4
votes
1 answer

shorten (subtract) header and remove empty line in fasta file by perl

I have a fasta file like this with headers like this: >GL13245678 ABCDEDERFSE >GL123456789 ABDFDRAGDTGEGAGFDAS >GL1254367890 AFGHSRSGFGSHSFG I want to change the header to contain only GL and 6 digits and remove the empty line above each header,…
-5
votes
1 answer

Pattern counter in fasta file

I am trying to get the count of matching patterns in the fasta file. I am starting from a fasta file containing 57k sequences. I want to pull out the count of my matching pattern sequence and show the starting position of the pattern Input…
-7
votes
2 answers

Swap lines in a script by iteration over lines

Here is my file: >ref AAAAAAA >seq1 BBBBBBB >ref AAAAAAA >seq2 CCCCCCC >ref AAAAAAA >seq3 DDDDDDD ... Here is what I'd like to get: >seq1 AAAAAAA >ref BBBBBBB >seq2 AAAAAAA >ref CCCCCCC >seq3 AAAAAAA >ref DDDDDDD ... So, swap line 1 with line 3,…
tlorin
  • 1,100
  • 6
  • 17
  • 30
1 2 3
61
62