Questions tagged [dna-sequence]

A string representing the nucleotide sequence of the deoxyribonucleic acid, the molecule that holds the genes that constitute the genetic code.

Deoxyribonucleic acid (DNA) contains the genetic instructions specifying the biological development of all cellular life. DNA consists of two long polymers of simple units called nucleotides.

DNA single chain sequences are commonly represented as a string of uppercase letters that correspond to the nucleotide units in the sequence (A, G, C, T). More seldom, ambiquity codes are also used to specify that several alternative nucleotides are possible in the given position (R - A or G, Y - C or T, see complete table.

A great amount of work in bioinformatics is related with the analysis and comparison of these strings. DNA sequences may be very long or they sets may get very large (gigabytes).

Related tags:

475 questions
2
votes
3 answers

minimum length window in string1 where string2 is subsequence

Main DNA sequence(a string) is given (let say string1) and another string to search for(let say string2). You have to find the minimum length window in string1 where string2 is subsequence. string1 = "abcdefababaef" string2 = "abf" Approaches that i…
Shweta
  • 1,111
  • 3
  • 15
  • 30
2
votes
4 answers

How can I reverse compliment a multiple sequence fasta file with python?

I am new to python and I am trying to figure out how to read a fasta file with multiple sequences and then create a new fasta file containing the reverse compliment of the sequences. The file will look something…
scooterdude32
  • 65
  • 1
  • 6
2
votes
2 answers

Increase string overlap matrix building efficiency

I have a huge list (N = ~1million) of strings 100 characters long that I'm trying to find the overlaps between. For instance, one string might be XXXXXXXXXXXXXXXXXXAACTGCXAACTGGAAXA (and so on) I need to build an N by N matrix that contains the…
Dustin
  • 6,783
  • 4
  • 36
  • 53
2
votes
2 answers

Commercial databases adept in storing biological sequences

Which commercial databases are adept in storing biological sequences like Protein/DNA sequence? Are there any which were designed specifically to store such sequences? cheers
Arnkrishn
  • 29,828
  • 40
  • 114
  • 128
2
votes
5 answers

Translating a cDNA to amino acids using Perl

So I am trying to translate a complementary strand of DNA to it's respective amino acids. So far I have this code: #!/usr/bin/perl open (INFILE, "sumaira2.out"); open (OUTFILE3, ">>sumaira3.out"); %aacode = ( TTT => "F", TTC => "F", TTA => "L",…
user3268152
  • 31
  • 1
  • 4
2
votes
1 answer

How to use as.DNAbin{ape} with DNA sequences stored in a dataframe?

I have a dataframe with loci names in one column and DNA sequences in the other. I'm trying to use as.DNAbin{ape} or similar to create a DNAbin object. Here some example data: x <- structure(c("55548", "43297", "35309", "34468",…
A.Mstt
  • 301
  • 1
  • 3
  • 15
2
votes
4 answers

Codon alignment via Python?

I have pairs of coding DNA sequences which I wish to perform pairwise codon alignments via Python, I have "half completed" the process. So far.. I retrive pairs of orthologous DNA sequences from genbank using Biopython package. I translate the…
2
votes
1 answer

Generate all possible dna sequences from a few given sets

I have been trying to wrap my head around this for a while now but have not been able to come up with a good solution. Here goes: Given a number of sets: set1: A, T set2: C set3: A, C, G set4: T set5: G I want to generate all possible sequences…
reprazent74
  • 95
  • 1
  • 8
2
votes
2 answers

Compute transitive closure

I have my data of pairwise DNA sequences showing similarity in the following way.. AATGCTA|1 AATCGTA|2 AATCGTA|2 AATGGTA|3 AATGGTA|3 AATGGTT|8 TTTGGTA|4 ATTGGTA|5 ATTGGTA|5 CCTGGTA|9 CCCGGTA|6 GCCGGTA|7 GGCGGTA|10 AATCGTA|2 GGCGGTA|10 …
bala
  • 553
  • 4
  • 7
2
votes
1 answer

Concatenation in C with 2D char array

I am reading in a textfile line by line into a 2D array. I want to concatenate the char arrays so I have one long char array. I am having trouble with this, I can get it to work with two char arrays but when I try to do a lot of them I go…
Ben Fossen
  • 997
  • 6
  • 22
  • 48
2
votes
5 answers

Looking for elegant glob-like DNA string expansion

I'm trying to make a glob-like expansion of a set of DNA strings that have multiple possible bases. The base of my DNA strings contains the letters A, C, G, and T. However, I can have special characters like M which could be an A or a C. For…
Rich
  • 12,068
  • 9
  • 62
  • 94
1
vote
4 answers

Using Perl to iterate through a string 3 positions at a time

I have written the following code in Perl. I want to iterate through a string 3 positions (characters) at a time. If TAA, TAG, or TGA (stop codons) appear, I want to print till the stop codons and remove the rest of the…
zock
  • 223
  • 4
  • 13
1
vote
1 answer

Multiple mismatches in DNA search sequence regex

I have written this barbaric script to create permutations of a string of characters that contain n (up to n=4) $'s in all possible combinations of positions within the string. I will eventually .replace('$','(\\w)') to use for mismatches in a dna…
jhjudd
  • 11
  • 4
1
vote
2 answers

Regex: extracting DNA info between 2 markers

I'm trying to extract some DNA info from a file. Before the DNA data consisting of bases GCAT there is the word ORIGIN, and after there is a //. How do I write a regular expression to get these bases between these markers? I have tried the following…
user1044585
  • 493
  • 2
  • 5
  • 19
1
vote
0 answers

Using msa package in R and it is crashing

I am running the msa package to create a DNA alignment for the phangorn package and it crashes with this error I am running RStudio with R v4.3.1 on an M1 Mac Book Pro mult <- msa(seqs, method="Muscle", type="dna", order="input") That results…