Questions tagged [dna-sequence]

A string representing the nucleotide sequence of the deoxyribonucleic acid, the molecule that holds the genes that constitute the genetic code.

Deoxyribonucleic acid (DNA) contains the genetic instructions specifying the biological development of all cellular life. DNA consists of two long polymers of simple units called nucleotides.

DNA single chain sequences are commonly represented as a string of uppercase letters that correspond to the nucleotide units in the sequence (A, G, C, T). More seldom, ambiquity codes are also used to specify that several alternative nucleotides are possible in the given position (R - A or G, Y - C or T, see complete table.

A great amount of work in bioinformatics is related with the analysis and comparison of these strings. DNA sequences may be very long or they sets may get very large (gigabytes).

Related tags:

475 questions
2
votes
1 answer

Calculating physico-chemical properties of amino acids in Biojava

I need to calculate the number and percentages of polar/non-polar, aliphatic/aromatic/heterocyclic amino acids in this protein sequence that I got from UNIPROT, using BioJava. I have found in the BioJava tutorial how to read the Fasta files and…
2
votes
2 answers

Use result of forloop to create new list in python

I have created a mutate_v1 function that generates random mutations in a DNA sequence. def mutate_v1(sequence, mutation_rate): dna_list = list(sequence) for i in range(len(sequence)): r = random.random() if r <…
user8769986
2
votes
2 answers

random mutation in a DNA sequence - mutation rate in python

EDITED I created a function that introduces random mutations with a determined mutation rate (example 0.05). input='ATCTAGGAT' def mutate_v2(sequence, mutation_rate): dna_list = list(sequence) for i in range(len(sequence)): r =…
user8769986
2
votes
1 answer

Find subsequence in matrix

I was practicing in an online judge and I get this challenge, "Given a six-character string array (DNA string) find if has a mutation, you know if a mutation exists if you find a subsequence of 4 equals consecutive characters, you can find it…
2
votes
3 answers

Finding how many times a nucleotide appear in the same position

I'm new to python and im trying to solve a question which I am given a few dna sequences, for example: sequences = ["GAGGTAAACTCTG", "TCCGTAAGTTTTC", "CAGGTTGGAACTC", "ACAGTCAGTTCAC", "TAGGTCATTACAG", "TAGGTACTGATGC"] I want to know how many times…
2
votes
2 answers

Looking for a python function to find the longest sequentially repeated substring in a string

I'm doing some coding with DNA sequences and I'm interested in a function to find sequential repeats (which could represent where primers could 'slip' AKA do bad stuff). An example of what I'm interested in would be as…
Ryguy
  • 96
  • 8
2
votes
2 answers

CS50 Problem Set 6 (DNA) "Python", I can't count Intermittent DNA sequence, my code succeeds in a small database, fail in the large one

I am a beginner in programming, so I decided to take a CS50 course. In Problem Set6 (Python) I wrote the code and it worked for the small database but it failed for the big one so I only asked for help with the idea. Here is the course page, and you…
2
votes
2 answers

Map start position in a vector to stop position in another vector

I have derived all the start and stop positions within a DNA string and now I would like to map each start position with each stop position, both of which are vectors and then use these positions to extract corresponding sub strings from the DNA…
B_bunny
  • 35
  • 1
  • 6
2
votes
2 answers

Applying a function element-wise on a vector or list would fail using sapply or lapply

I have the following vector v: c("tactagcaatacgcttgcgttcggtggttaagtatgtataatgcgcgggcttgtcgt", "tgctatcctgacagttgtcacgctgattggtgtcgttacaatctaacgcatcgccaa", "gtactagagaactagtgcattagcttatttttttgttatcatgctaaccacccggcg") i'm facing a very upsetting…
Miguel 2488
  • 1,410
  • 1
  • 20
  • 41
2
votes
1 answer

Keras Conv1D on DNA sequence

I would like to apply a 1D convolution on fixed size DNA sequence using keras. Dna sequence is 45 bases long. Each sequence has been one-hot encoded. There is one filter with kernel size = 3. See picture below : I have 1000 sequences for my…
dridk
  • 180
  • 2
  • 13
2
votes
1 answer

Loop for iterating trhough variable URL (Python)

Python newby here. I am developing a simple sequence search program for DNA sequences. The main idea is to get from the NCBI database the different sequences from an specific genome and start-end points. So far, I am able to do a simple search for…
Voronin_10
  • 21
  • 1
2
votes
1 answer

Fast way to find a substring with some mismatches allowed

I am looking for help in making an efficient way to process some high throughput DNA sequencing data. The data are in 5 files with a few hundred thousand sequences each, within which each sequence is formatted as…
2
votes
3 answers

How to find specific frequency of a codon?

I am trying to make a function in R which could calculate the frequency of each codon. We know that methionine is an amino acid which could be formed by only one set of codon ATG so its percentage in every set of sequence is 1. Where as Glycine…
2
votes
1 answer

Read fasta sequence

I have been trying to read the fasta sequence, but somehow, it is always skipping the last sequence. Could you please help me, what am I missing here. Here is the code below. import sys fasta = [] test = [] with open("tests.fasta") as file_one: …
Nitin
  • 396
  • 7
  • 18
2
votes
2 answers

Fixing an error with assigning of empty list to value "DNA_Sequence"

So, this question is the continuation of previous post Python-script, which should translate 1000 DNA-Sequences to proteins by 1152 different codontables, don't work. From that time I have edited script in PyCharm and now this script has such…