Questions tagged [fasta]

FASTA is a software package for sequence alignment of proteins and nucleic acids. FASTA is also the name of the file format used by these programs to represent sequences of peptides or nucleotides. The format is a de facto standard in bioinformatics.

The FASTA format (read as "fast A format") is a text-based format used by the FASTA software for representing nucleic acids and proteins. It represents each nucleotide and amino-acid as a letter. The FASTA format also supports naming of sequences.

The format achieved great popularity, becoming the de facto standard for representing biological sequences.

A bioinformatical record in FASTA format consists of the header (comment) string followed by one or more strings describing the sequence (one letter per nucleotide or amino acid). Header strings begin with >. The sequence that follows is wrapped at a fixed width (often 60, but generally no more than 80).

> Sample nucleotide sequence
AGCACTGAGTAACGTATAAGCAGTCCCCGGACGCGTA
> Nucleotide sequence #2
GCCACGGGAGTTGAAGAACATCGAGAATGCCACTAGTTTTCACCCTTCATAGATATCCTA
GCGCCGTACATGTATACGAGATCTTTGTCACGCAGTATGGAGGATTGTGGCCAGCAATAC
GTCGTGTCCCGCAATGCTTCATTAGATCCCCGTATATCCATCCTGAGTCATTGTCTGTTG
TCCGTTTTGAAGGAGTCTAGCAGCTTGATA

921 questions

votes

4 answers

Filtering a fasta file with sequences that match a certain string in another file

With BLAST I have obtained a file with two tab-separated columns, one with species names and the other with a gene name (the name of the most similar gene in a reference database). My goal is to find in the first file all the species names for which…

asked Mar 31 '23 at 11:15

MarcD

votes

3 answers

How do I merge two FASTA files (one file with line break) in Perl?

I have two following Fasta file: file1.fasta >0 GAATAGATGTTTCAAATGTACCAATTTCTTTCGATT >1 GTTAAGTTATATCAAACTAAATATACATACTATAAA >2 GGGGCTGTGGATAAAGATAATTCCGGGTTCGAATAC file2.qual >0 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40…

perl bioinformatics fasta

asked Apr 10 '09 at 06:12

neversaint

60,904
137
310
477

votes

6 answers

Delete lines shorter than a certain length and the one above it (remove short sequences in a FASTA file)

I have a file containing the following text: >seq1 GAAAT >seq2 CATCTCGGGA >seq3 GAC >seq4 ATTCCGTGCC If a line that doesn't start with ">" is shorter than 5 characters, I want to delete it and the one right above it. Expected…

sed bioinformatics fasta

asked Jul 17 '22 at 19:43

Honorato

votes

3 answers

Appending filename at the end of certain lines in a text file

I am trying to append a file name at the end of certain lines in many files which I am concatenating. short example: INPUTS: filename (1): 1234_contigs.fasta >NODE_STUFF GATTACA filename (2):…

bash sed bioinformatics fasta

asked May 25 '22 at 18:20

statlerNwaldorf

votes

3 answers

awk combine info from two files (fasta file header)

I know there are many similar questions, and I had read through many of them. But I still can't make my code work. Could somebody point the problem out for me please? Thanks! (base) $ head Sample.pep2 >M00000032072 gene=G00000025773 seq_id=ChrM…

awk bioinformatics fasta

asked May 04 '22 at 19:44

zzz

votes

1 answer

bioinformatics compressing nucleotide sequences

What would be the recommended compression algorithm (.xz, tar.gz, tar.bz2 and so on) for compressing a dataset consisting of fasta nucleotide sequences? What would be the recommended compression mechanisms for such data? Dictionary based…

compression bioinformatics fasta

asked Oct 30 '21 at 04:54

Allan K

votes

2 answers

Removing lines which match with specific pattern from another file

I've got two files (I only show the beginning of these files)…

awk grep fasta

asked Feb 18 '21 at 16:41

Paillou

votes

4 answers

How to retrieve sequences from a Fasta file by gene ID

I know this question has been asked a hundred times but I've been at it all day and I can't seem to make this work. I have a fasta file that looks like this ... >BGI_novel_T016697…

linux bioinformatics fasta

asked Feb 14 '21 at 15:38

Tezie

votes

2 answers

Find length of a contig in one fasta, using the header of another fasta as query in python

I'm trying to find a python solution to extract the length of a specific sequence within a fasta file using the full header of the sequence as the query. The full header is stored as a variable earlier in the pipeline (i.e. "CONTIG"). I would like…

python bioinformatics biopython fasta

asked Jul 26 '20 at 22:29

Gunther

votes

1 answer

How to remove duplicates from fasta file but keep at least one per group based on header

I have a multifasta file that looks like this: ( all sequences are >100bp, more than one line, and same lenght…

python fasta

asked Jul 25 '20 at 19:52

Xela Vi

votes

1 answer

Pairwise alignment of multi-FASTA file sequences

I have multi-FASTA file containing more than 10 000 fasta sequences resulted from Next Generation Sequencing and I want to do pairwise alignment of each sequence to each sequence inside the file and store all the results in the same new file in…

python bioinformatics biopython fasta pairwise

asked Aug 05 '19 at 16:07

Aurora

votes

1 answer

Is there a way to collect many multiline strings delineated by a specific character into an Arraylist using the data stream in Java 8?

I have a fasta file that I want to parse into an ArrayList, each position having an entire sequence. The sequences are multiline strings, and I don't want to include the identification line in the string that I store. My current code splits each…

arraylist collections java-8 fasta multilinestring

asked Apr 27 '19 at 16:35

Sam

votes

8 answers

Remove multiple sequences from fasta file

I have a text file of character sequences that consist of two lines: a header, and the sequence itself in the following line. The structure of the file is as…

bash awk sed fasta

asked Apr 11 '19 at 15:24

Loïs Rancilhac

votes

2 answers

Directly calling SeqIO.parse() in for loop works, but using it separately beforehand doesn't? Why?

In python this code, where I directly call the function SeqIO.parse() , runs fine: from Bio import SeqIO a = SeqIO.parse("a.fasta", "fasta") records = list(a) for asq in SeqIO.parse("a.fasta", "fasta"): print("Q") But this, where I first…

python bioinformatics biopython fasta

asked Feb 21 '19 at 02:32

Abraham Ahmad

votes

5 answers

Extract sequence header for a given sequence in fasta file

I have a fasta file(myfasta.fasta) like this: >aat.2.2344.a ATTGCCGGTTTAATATTA >aat.2.d2344.acc ATTGCCGGTTTAATAAA >aat.2.2bb344.a ATTGCCGGTTTAATAGGAGAGAATT >aat.2.2ccc344.a ATTGCCGGTTTAATAGGGAG >aat.2.2344.acc ATTGCCGGTTTAATAAA I also have a text…

unix awk sed bioinformatics fasta

asked Oct 18 '18 at 20:31

MAPK

5,635
4
37
88

Prev 1 2 3

…

61 62 Next