Questions tagged [fasta]

FASTA is a software package for sequence alignment of proteins and nucleic acids. FASTA is also the name of the file format used by these programs to represent sequences of peptides or nucleotides. The format is a de facto standard in bioinformatics.

The FASTA format (read as "fast A format") is a text-based format used by the FASTA software for representing nucleic acids and proteins. It represents each nucleotide and amino-acid as a letter. The FASTA format also supports naming of sequences.

The format achieved great popularity, becoming the de facto standard for representing biological sequences.

A bioinformatical record in FASTA format consists of the header (comment) string followed by one or more strings describing the sequence (one letter per nucleotide or amino acid). Header strings begin with >. The sequence that follows is wrapped at a fixed width (often 60, but generally no more than 80).

> Sample nucleotide sequence
AGCACTGAGTAACGTATAAGCAGTCCCCGGACGCGTA
> Nucleotide sequence #2
GCCACGGGAGTTGAAGAACATCGAGAATGCCACTAGTTTTCACCCTTCATAGATATCCTA
GCGCCGTACATGTATACGAGATCTTTGTCACGCAGTATGGAGGATTGTGGCCAGCAATAC
GTCGTGTCCCGCAATGCTTCATTAGATCCCCGTATATCCATCCTGAGTCATTGTCTGTTG
TCCGTTTTGAAGGAGTCTAGCAGCTTGATA
921 questions
4
votes
1 answer

Find all repeated 4-mers in a DNA Sequence - Perl

Hello, I try to write a program that reads in a FASTA-formatted file containing multiple DNA sequences, identifies all repeated 4-mers (i.e., all 4-mers that occur more than once) in a sequence, and prints out the repeated 4-mer and the header of…
ic23oluk
  • 125
  • 1
  • 9
4
votes
2 answers

Searching FASTA file for motif and returning title line for each sequence containing the motif

Below is the code I have for searching a FASTA file entered at the command line for a user-provided motif. When I run it and enter a motif that I know is in the file it returns 'Motif not found'. I'm only a beginner in Perl, and I can't fugure out…
Kevin Egan
  • 41
  • 2
4
votes
2 answers

"NotImplementedError: SeqRecord" when using sorted on a fasta file parsed using SeqIO

I'm trying to sort a fasta file by alphabetical order of the sequences in the file (not the ID of the sequences). The fasta file contains over a 200 sequences and I'm trying to find duplicates (by duplicates I mean almost same protein sequence, but…
4
votes
1 answer

Estimate Alphabet in Biopython from fasta file

I am looking for a way to read a .fasta file in Biopython and have the package estimate if we are dealing with DNA, RNA or proteins. So far, I read data like this: with open('file.fasta', 'r') as f: for seq in sio.parse(f, 'fasta'): # do…
romeasy
  • 260
  • 1
  • 3
  • 12
4
votes
2 answers

Why does my grep command output "--" between some lines?

I have a fasta file like the test one here: >HWI-D00196:168:C66U5ANXX:3:1106:16404:19663 1:N:0:GCCAAT CCTAGCACCATGATTTAATGTTTCTTTTGTACGTTCTTTCTTTGGAAACTGCACTTGTTGCAACCTTGCAAGCCATATAAACACATTTCAGATATAAGGCT >HWI-D00196:168:C66U5ANXX:3:1106:16404:19663…
michberr
  • 303
  • 1
  • 10
4
votes
3 answers

Adding each item in list to end of specific lines in FASTA file

I solved this in the comments below. So essentially what I am trying to do is add each element of a list of strings to the end of specific lines in a different file. Hard to explain but essentially I want to parse a FASTA file, and every time it…
bburc
  • 169
  • 1
  • 2
  • 13
4
votes
3 answers

Memory limit in converting FASTA file string to list

I am using python 2.7 I am working with a fasta file containing DNA sequence of modern human Y chromosome. Actually it is a long string of about 20000000 characters like ATCGACGATCACACG.... I want to convert this very long string to a list of triad…
user4374379
4
votes
2 answers

Collapse a list of DNAstringsets into a single DNAStingset in order to apply writeXStringSet() and turn it into fasta file in R

Using R for bioinformatics here: I have a list of DNAstringsSets(seen below) and want to use the writeXstringset() function which takes a DNAstringset object as an argument in order to save as a FASTA file.Anyone knows how is it possible to collapse…
NEWSCIENT
  • 57
  • 1
  • 3
4
votes
3 answers

Using Bio.SeqIO to write single-line FASTA

QIIME requests this (here) regarding the fasta files it receives as input: The file is a FASTA file, with sequences in the single line format. That is, sequences are not broken up into multiple lines of a particular length, but instead the entire…
Korem
  • 11,383
  • 7
  • 55
  • 72
4
votes
4 answers

Convert table into fasta in R

I have a table like this: >head(X) column1 column2 sequence1 ATCGATCGATCG sequence2 GCCATGCCATTG I need an output in a fasta file, looking like this: sequence1 ATCGATCGATCG sequence2 GCCATGCCATTG So, basically I need all entries of the 2nd…
user3586764
  • 53
  • 1
  • 3
4
votes
3 answers

append contents from one file to another with newline separation

I'm trying to, I think, replicate the cat functionality of the Linux shell in a platform-agnostic way such that I can take two text files and merge their contents in the following manner: file_1 contains: 42 bottles of beer on the wall file_2…
glarue
  • 530
  • 7
  • 20
4
votes
2 answers

Scala functional way of processing large scala data with lazy collections

I am trying to figure out memory-efficient AND functional ways to process a large scale of data using strings in scala. I have read many things about lazy collections and have seen quite a bit of code examples. However, I run into "GC overhead…
Wayne Jhukie
  • 147
  • 2
  • 12
3
votes
3 answers

Add multiple sequences from a FASTA file to a list in python

I'm trying to organize file with multiple sequences . In doing so, I'm trying to add the names to a list and add the sequences to a separate list that is parallel with the name list . I figured out how to add the names to a list but I can't figure…
O.rka
  • 29,847
  • 68
  • 194
  • 309
3
votes
1 answer

Making Blast database from FASTA in Python

How can I do this? I use Biopython and saw manual already. Of course I can make blastdb from FASTA using "makeblastdb" in standalone NCBI BLAST+, but I want to whole process in one program. It seems there are two possible solutions. Find a function…
3
votes
2 answers

Parsing file in parallel

I am thinking about a way to parse a fasta-file in parallel. For those of you not knowing fasta-format an example: >SEQUENCE_1 MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG …
peri4n
  • 1,389
  • 13
  • 24
1 2
3
61 62