Questions tagged [dna-sequence]

A string representing the nucleotide sequence of the deoxyribonucleic acid, the molecule that holds the genes that constitute the genetic code.

Deoxyribonucleic acid (DNA) contains the genetic instructions specifying the biological development of all cellular life. DNA consists of two long polymers of simple units called nucleotides.

DNA single chain sequences are commonly represented as a string of uppercase letters that correspond to the nucleotide units in the sequence (A, G, C, T). More seldom, ambiquity codes are also used to specify that several alternative nucleotides are possible in the given position (R - A or G, Y - C or T, see complete table.

A great amount of work in bioinformatics is related with the analysis and comparison of these strings. DNA sequences may be very long or they sets may get very large (gigabytes).

Related tags:

475 questions
2
votes
1 answer

How to track the position of a start codon (ATG) in a nucleotide sequence after using the translate function of Biopython?

I have a FASTA file with a bunch of sequences with the following format: BMRat|XM_008846946.1 ATGAAGAACATCACAGAAGCCACCACCTTCATTCTCAAGGGACTCACAGACAATGTGGAACTACAGGTCA TCCTCTTTTTTCTCTTTCTAGCGATTTATCTCTTCACTCTCATAGGAAATTTAGGACTTATTATTTTAGT …
2
votes
1 answer

Writing and Reading a big file for analytical purposes

I'm trying to make a DNA Analytical tool, but I'm facing a big problem here. Here's a screenshot on how the application looks like. The problem I'm facing is handling large data. I've used streams and memory mapped files, but I'm not really sure if…
Hades
  • 865
  • 2
  • 11
  • 28
2
votes
1 answer

DNA conditional frequency in R

I'm trying to find if there is any conditional dependence within 2 different DNA sequences in R This is my code, however i'm getting an error; Error in `[.data.frame`(data, i) : undefined columns selected I'm not sure where the issue is, if I…
daenwaels
  • 85
  • 1
  • 7
2
votes
1 answer

Translate function: "error:sequence is not a vector of chars"

I'm struggling with the translate function: I have a matrix of sequences and I can't manage to figure out why the translate function would not work. Here is my script : head(myseq) [,1] …
2
votes
1 answer

uppered dna transcription with for loop

This is the code I have to achieve my objective as stated in the title. The problem I seem to have right now is the second line of code. The moment I added it the program stopped working without giving me an error. Thanks in advance for the…
forsale
  • 21
  • 1
2
votes
1 answer

Perl: Return Highest Percent Match for Strings

I have a DNA sequence, like ATCGATCG for example. I also have a database of DNA sequences formatted as follows: >Name of sequence1 SEQUENCEONEEXAMPLEGATCGATC >Name of sequence2 SEQUENCETWOEXAMPLEGATCGATC (So the odd numbered lines contain a name,…
Aditya J.
  • 131
  • 2
  • 11
2
votes
3 answers

Compare Multiple Substrings

I'm attempting to write a basic dna sequencer. In that, given two sequences of the same length, it will output the strings which are the same, with a minimal length of 3. So input of abcdef dfeabc will return 1 abc I am not sure how to go about…
Blackbinary
  • 3,936
  • 18
  • 49
  • 62
2
votes
2 answers

Finding Specific Vector Entries in a Sliding Window

I am trying to create a function that will return counts of specific adjacent nucleotides (CG beside eachother) within a specific window that I have formatted in a vector. I would like the windows to be 100 nucleotides long and move shift every…
2
votes
1 answer

Biopython: Local alignment between DNA sequences doesn't find optimal alignment

I'm writing code to find local alignments between two sequences. Here is a minimal, working example I've been working on: from Bio import pairwise2 from Bio.pairwise2 import format_alignment seq1 = "GTGGTCCTAGGC" seq2 = "GCCTAGGACCAC" # scores for…
2
votes
0 answers

cannot write.XStringsViews output

For the following program: library(Biostrings) library(IRanges) #added this when I saw it mentioned in docs s1 <- "DNA here" #insert desired sequence between quotes matchPattern("ATG", s1) # Find all ATGs in the sequence s1 I get output that looks…
tortoiseshell
  • 33
  • 1
  • 7
2
votes
1 answer

Rotating subgraphs with dot

I'm trying to represent a bidirectional graph structure using graphviz. Say I have 3 nodes A, B, C, corresponding to DNA fragments. I want to represent some structure inside each node, but also the relationship among the nodes, which let's say is A+…
Matei David
  • 2,322
  • 3
  • 23
  • 36
2
votes
2 answers

Get a complete view of a long DNA sequence of a Biostrings object in R

I'm tried to get reverse complement of a DNA sequence in R applying Biostrings package. The length of sequence is around 900 and I want to see it completely but R shows an abstract version with some dots between the codes. Is there anyway to get it…
arado1
  • 51
  • 7
2
votes
2 answers

Python regex module fuzzy match: substitution count not as expected

Background The Python module regex allows fuzzy matching. You can specify the allowable number of substitutions (s), insertions (i), deletions (d), and total errors (e). The fuzzy_counts property of a match result returns a tuple (0,0,0), where:…
Colin Anthony
  • 1,141
  • 12
  • 21
2
votes
2 answers

Finding inverted repeats in DNA sequence

I have a long string of DNA sequence, and I need to find regions consist of two palindromic sequences flanking a spacer sequence. The input…
zebra
  • 83
  • 1
  • 1
  • 5
2
votes
1 answer

How to join certain items in list

My list looks like this : ['', 'CCCTTTCGCGACTAGCTAATCTGGCATTGTCAATACAGCGACGTTTCCGTTACCCGGGTGCTGACTTCATACTT CGAAGA', 'ACCGGGCCGCGGCTACTGGACCCATATCATGAACCGCAGGTG', '', '',…
Tomek Sztuk
  • 69
  • 1
  • 1
  • 9