Questions tagged [dna-sequence]

A string representing the nucleotide sequence of the deoxyribonucleic acid, the molecule that holds the genes that constitute the genetic code.

Deoxyribonucleic acid (DNA) contains the genetic instructions specifying the biological development of all cellular life. DNA consists of two long polymers of simple units called nucleotides.

DNA single chain sequences are commonly represented as a string of uppercase letters that correspond to the nucleotide units in the sequence (A, G, C, T). More seldom, ambiquity codes are also used to specify that several alternative nucleotides are possible in the given position (R - A or G, Y - C or T, see complete table.

A great amount of work in bioinformatics is related with the analysis and comparison of these strings. DNA sequences may be very long or they sets may get very large (gigabytes).

Related tags:

levenshtein-distance

475 questions

votes

1 answer

Get position of subsequence using Levenshtein-Distance

I have a huge number of records containing sequences ('ATCGTGTGCATCAGTTTCGA...'), up to 500 characters. I also have a list of smaller sequences, usually 10-20 characters. I would like to use the Levenshtein distance in order to find these smaller…

asked Nov 01 '13 at 10:40

biojl

1,060
1
8
26

votes

1 answer

Getting all string combinations by given maximal Hamming distance (number of mismatches) in Java

Is there an algorithm go generate all possible string combinations of a string (DNA Sequence) by a given number of maximal allowed positions that can variate (maximal Mismatches, maximal Hamming distance)? The alphabet is {A,C,T,G}. Example for a…

java algorithm permutation dna-sequence hamming-distance

asked Oct 09 '13 at 10:36

Maximilian Wiens

votes

10 answers

Are there any existing solutions for creating a generic DNA sequence database with a website front end?

I'd like to create an rRNA sequence database with a web front end for the lab I work in. It seems common in biology to want to search a large number of sequences using alignment algorithms such as BLAST and HMMER, so I wondered if there is any…

database search web bioinformatics dna-sequence

asked Dec 11 '09 at 19:19

Michael Barton

8,868
9
35
43

votes

2 answers

cluster short, homogeneous strings (DNA) according to common sub-patterns and extract consensus of classes

Task: to cluster a large pool of short DNA fragments in classes that share common sub-sequence-patterns and find the consensus sequence of each class. Pool: ca. 300 sequence fragments 8 - 20 letters per fragment 4 possible letters: a,g,t,c …

string cluster-analysis bioinformatics dna-sequence

asked Oct 02 '09 at 12:50

SimonSalman

votes

1 answer

Unexpected output in randomized motif search in DNA strings

I have the following t=5 DNA strings: DNA = '''CGCCCCTCTCGGGGGTGTTCAGTAAACGGCCA GGGCGAGGTATGTGTAAGTGCCAAGGTGCCAG TAGTACCGAGACCGAAAGAAGTATACAGGCGT TAGATCAAGTTTCAGGTGCACGTCGGTGAACC AATCCACCAGCTCCACGTGCAATGTTGGCCTA''' k = 8 t = 5 I'm trying to find…

python search random bioinformatics dna-sequence

asked Feb 20 '20 at 00:39

slap-a-da-bias

votes

5 answers

How can we compress DNA string efficiently

DNA strings can be of any length comprising any combination of the 5 alphabets (A, T, G, C, N). What could be the efficient way of compressing DNA string of alphabet comprising 5 alphabets (A, T, G, C, N). Instead of considering 3 bits per alphabet,…

c++ algorithm compression dna-sequence lossless-compression

asked Aug 15 '18 at 13:02

ksn

votes

1 answer

Get span and match values from SRE_Match objects(Python)

I am trying to find a match of certain length (range 4 to 12) in a DNA sequence Below is the code: import re positions =[] for i in range(4,12): for j in range(len(dna)- i+1): positions.append(re.search(dna[j:j+i],comp_dna)) #Remove…

python regex object bioinformatics dna-sequence

asked Jun 30 '18 at 10:46

Kian

votes

2 answers

Find length of overlap in strings

do you know any ready-to-use method to obtain length and also overlap of two strings? However only with R, maybe something from stringr? I was looking here, unfortunately without succes. str1 <- 'ABCDE' str2 <- 'CDEFG' str_overlap(str1,…

r string bioinformatics overlap dna-sequence

asked Feb 09 '18 at 07:54

Adamm

2,150
22
30

votes

1 answer

Fastest way to add Ns to variable length sequences such that they all equal 150bp

Say I have a fasta containing 3 sequences... ATTTTTGGA AT A I want my sequence data to look like this: ATTTTTGGA ATTNNNNNN ANNNNNNNN Are there any programs or scripts that could accomplish this in a reasonable timeframe. I have thousands of…

bioinformatics biopython dna-sequence

asked Feb 24 '17 at 01:38

user3105519

votes

3 answers

Use Python to extract Branch Lengths from Newick Format

I have a list in python consisting of one item which is a tree written in Newick Format, as…

python regex dna-sequence phylogeny

asked Apr 19 '14 at 16:06

PaulBarr

votes

5 answers

Reverse complement DNA

I have this equation for reverse complementing DNA in python: def complement(s): basecomplement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} letters = list(s) letters = [basecomplement[base] for base in letters] return…

python bioinformatics dna-sequence

asked Oct 24 '13 at 15:55

Michael Ridley

votes

2 answers

Quantifying frequency of codons in a transmembrane sequence - apply function?

I am trying to look at the codon usage within the transmembrane domains of certain proteins. To do this, I have the sequences for the TM domain, and I want to search these sequences for how often certain codons appear (the frequency). Ideally I…

r apply bioinformatics lapply dna-sequence

asked May 17 '22 at 12:51

cambio

votes

3 answers

How to catch the longest sequence of a group

The task is to find the longest sequence of a group for instance, given DNA sequence: "AGATCAGATCTTTTTTCTAATGTCTAGGATATATCAGATCAGATCAGATCAGATCAGATC" and it has 7 occurrences of AGATC. (AGATC) matches all occurrences. Is it possible to write a…

python python-3.x regex cs50 dna-sequence

asked May 29 '20 at 04:36

hamvee

votes

3 answers

How to find indices of identical sub-sequences in two strings in Ruby?

Here each instance of the class DNA corresponds to a string such as 'GCCCAC'. Arrays of substrings containing k-mers can be constructed from these strings. For this string there are 1-mers, 2-mers, 3-mers, 4-mers, 5-mers and one 6-mer: 6 1-mers:…

ruby dna-sequence

asked Nov 03 '19 at 21:34

kujak al

votes

2 answers

Making a list of all mutations of a sequence (DNA)

I have a DNA sequence, and I want to find all instances of it, or any of its possible mutations in a list of DNA sequence reads. I am using grepl to do this, since it is faster than matchPattern in the instance I am using it. I use parLapply to feed…

r permutation dna-sequence

asked Jul 26 '19 at 19:35

Joe Kowzun

Prev 1

…

31 32 Next