Questions tagged [sequence-alignment]

A type problem in which two or more sequences need to be lined up with each other, generally for the purposes of identifying similarities between them. These problems are common in bioinformatics, but the algorithms used to solve them are just as relevant to aligning other types of sequences, such as text strings. A variety of algorithms have been developed for dealing with various sub-sets of this problem.

Sequence alignment problems are a group of problems in which you have two or more sequences, generally with some potentially similar portions, that you want to line up so that the similar portions of each are associated. This is often an important component of calculating the similarity of the sequences.

Sequence alignment is frequently important in bioinformatics, in which sequences of DNA, RNA, or amino acids must be aligned in order to infer what mutations occurred where and when. However, sequence alignment problems occur in all domains in which there are sequences, such as in text matching.

Dynamic programming dynamic-programming is the most commonly used technique for aligning two sequences (for multiple sequence alignment, see below). Starting from the first element of each string, each pair of elements is either aligned (if they match) or dealt with with one of the operators described below (if they don't match). For more detail, see this page.

The exact dynamic programming algorithm used depends on the specific problem at hand. Here are the two most common:

Needleman-Wunsch - Global alignment of two sequences (i.e. all letters in both sequences need to be used)
Smith-Waterman - Local alignment of two sequences (i.e. only a subsequence of each string needs to be used)

Three main operations are generally allowed in sequence alignment. It's easiest to think of these operations as things that might have happened to one of the sequences to turn it into the other sequence:

Insertion: An element is inserted into one of the sequences. This is generally represented by adding a gap to the opposite sequence.
Deletion: An element is removed from one of the sequences. This is generally inserted by adding a gap to that sequence.
Mutation/Substitution: An element is replaced with a different element.

Each of these operations have a cost associated with them to reflect how likely it was that they would have happened to the original sequence. Mutation/Substitution generally has a different cost for different substitutions (often generated from a BLOSUM or PAM matrix in bioinformatics). Insertion and deletion accounted for with some sort of gap penalty. In simple implementations, this penalty is often a constant cost per gap, but in bioinformatics an affine gap penalty is often more appropriate.

Multiple Sequence Alignment: Dynamic programming quickly becomes computationally intractable expensive as the number of sequences being aligned increases. For this reason, multiple sequence alignment algorithms generally do not guarantee optimality. A variety of techniques are used:

Progressive alignment: In this technique, a series of pairwise alignments are used to create an overall multi-way alignment. Often the order in which these alignments are performed is determined by a hierarchical clustering algorithm like neighbor-joining or UPGMA. A number of tools exist for performing such alignments in bioinformatics, such as the Clustal family and T-Coffee.
Heuristic approaches: A wide variety of heuristics can be used for very large scale multiple sequence alignment. Blast is by far the most popular tool for this in the case of bioinformatics.
Hidden-Markov Models: HMMs can be used to find the most likely alignments for a set of sequences. HMMER is a popular bioinformatics tool for this approach.

131 questions

vote

0 answers

Find maximum gap rate in Smith-Waterman algorithm

I am now working on Smith-Waterman algorithm. I understand that by increasing the gap penalty, less gap will be obtained in my final alignment but I need advice on how to control the maximum gap rate (ratio of gapped character in the detected char)?…

sequence sequence-alignment

asked Sep 06 '18 at 14:00

bill

vote

1 answer

Aligning curves along the horizontal direction

I have some 'n' experimental curves for the same experimental conditions. Due to the inherent thermal drift in the system, the data sets are not exactly aligned with each other. I am looking for a robust algorithm that would align the data-curves…

matlab alignment overlap curves sequence-alignment

asked Jul 11 '18 at 10:04

Backspace

vote

1 answer

MuscleCommandline not working in Biopython

I need to integrate my python script with the muscle tool for multiple sequence alignment. I followed the tutorial on Biopython, here there is my code: from Bio.Align.Applications import MuscleCommandline muscle_exe = "muscle.exe" in_file =…

python biopython sequence-alignment

asked Nov 22 '17 at 00:54

Guido Muscioni

1,203
3
15
37

vote

2 answers

Draw lines connecting points between two separate one-D plots

As title, I am working on time-series alignment, and a visualization of the alignment result is desired. To this end, I want to draw lines connecting "anchor points" generated by the alignment algorithm. np.random.seed(5) x = np.random.rand(10) …

python matplotlib line sequence-alignment

asked May 18 '17 at 10:32

Francis

6,416
5
24
32

vote

1 answer

Algorithm to align numerical sequences

Hi I have two sequence of numerical data let's say : S1 : 1,6,4,9,8,7,5 and S2 : 6,9,7,5 And i'd like to find a sequence alignment in both sense left-right and right-left. So i used 2 techniques before asking i actually used the hungarian algorithm…

algorithm sequence-alignment

asked May 14 '17 at 22:31

Chakib Mataoui

vote

0 answers

Extract aligned sections of FASTA to new file

I've already looked here and in other forums, but couldn't find the answer to my question. I want to design baits for a target enrichment Sequencing approach and have the output of a MarkerMiner search for orthologous loci from four different…

extract bioinformatics fasta sequence-alignment

asked Oct 31 '16 at 14:00

sci_cloudy

vote

1 answer

Calculate (mean) sequence divergence for many sequences

I have ~13K sequences a 120 bases and I want to compare them to find things like conserved regions, a mean divergence between them or very diverging outliers. The problem is, with this number of sequences the things I tried aren't doable. So has…

bioinformatics biopython dna-sequence sequence-alignment

asked Sep 20 '16 at 09:38

voiDnyx

vote

1 answer

How to order multiple Fasta alignment files

I'm sure this is an easy-to-do thing, but I have very limited bioinformatic experience. I have many -100,000- FASTA files that contain alignments of different genes of the same 12 species. Each file looks something like…

bioinformatics fasta dna-sequence sequence-alignment

asked Sep 01 '16 at 13:12

NKGon

vote

1 answer

MiPS ASM Recursion understanding problems?

Please help me understand this formula (in case anybody is wondering, that is the Needleman-Wunsch-algorithm), I am supposed to write a code that uses recursion but I don't understand how to do so, I already have the full dynamic version written, so…

recursion assembly mips qtspim sequence-alignment

asked Mar 23 '16 at 23:10

Schero David

vote

1 answer

coloring part of a sequence in format_alignment in biopython

I am using format_alignment to look for pariwise alignment between two sequences. I want to highlight part of the sequence with a different color (say between base number 40 and base number 54) in the full alignment, so that it is clear to which…

biopython sequence-alignment

asked Sep 28 '15 at 14:00

Ssank

3,367
7
28
34

vote

1 answer

How does Biopython determine the root of a phylogenetic tree?

There are other packages, particularly ape for R, that build an unrooted tree then allow you to root it by explicitly specifying an outgroup. In contrast, in BioPython I can directly create a rooted tree without specifying the root, so I'm…

bioinformatics biopython dna-sequence phylogeny sequence-alignment

asked May 14 '15 at 21:08

nshaas

vote

2 answers

Multiple sequence alignment. Convert multi-line format to single-line format?

I have a multiple sequence alignment file in which the lines from the different sequences are interspersed, as in the format outputed by clustal and other popular multiple sequence alignment tools. It looks like this: TGFb3_human_used_for_docking …

bioinformatics sequence-alignment

asked May 08 '15 at 13:32

a06e

18,594
33
93
169

vote

0 answers

BioPerl: Annotate mismatches in an alignment

I'm reasonably new to perl and very new to BioPerl, so my apologies if this seems like a trivial question. I'm using Bio::AlignIO and Bio::SimpleAlign to generate pairwise alignments of sequences of interest to a reference sequence - in this case…

bioperl sequence-alignment

asked Oct 03 '14 at 13:54

user2460253

vote

1 answer

Multiple sequence alignment of 12 species

i need to perform MSA( multiple sequence alignment on nucleotide sequences of 12 wheat varieties. all these varieties have different length bps(base pairs).I followed this documentation of MATLAB…

matlab sequence-alignment

asked Aug 12 '14 at 06:21

user3395676

vote

1 answer

Prove that L >= G for Local and Global alignments of a specific function

I'm taking a bioinformatics class this semester and I'm having trouble with a specific question from the book. *Given two DNA sequences, S and T, of the same length n and let the scoring function be defined as follows: match = 1, mismatch = -1,…

bioinformatics sequence-alignment

asked Feb 18 '14 at 06:20

Auxiliary Soap

Prev 1 2 3

…

8 9 Next