Questions tagged [dna-sequence]

A string representing the nucleotide sequence of the deoxyribonucleic acid, the molecule that holds the genes that constitute the genetic code.

Deoxyribonucleic acid (DNA) contains the genetic instructions specifying the biological development of all cellular life. DNA consists of two long polymers of simple units called nucleotides.

DNA single chain sequences are commonly represented as a string of uppercase letters that correspond to the nucleotide units in the sequence (A, G, C, T). More seldom, ambiquity codes are also used to specify that several alternative nucleotides are possible in the given position (R - A or G, Y - C or T, see complete table.

A great amount of work in bioinformatics is related with the analysis and comparison of these strings. DNA sequences may be very long or they sets may get very large (gigabytes).

Related tags:

475 questions
-1
votes
2 answers

Count GC content of fasta using python without error

Human genome is made of 24 different chromosomes (actually 23 pairs= 46 chromosomes). this chromosomes are called 1, 2, 3, ..., 22, X and Y. Each chromosome is a very long string of 'G', 'C', 'A' and 'T' characters (for example chromosome 1 is made…
-1
votes
2 answers

mixing content of DNA (list) in R

I have a list of DNA sequences and I want to mix the contents. Let's say dna_lst: [1] AATTAATTCC [2] ATCGATCG [3] TTTAACCCCCGG I want to generate mix dna content like:dna_mix: [1] TACAATTACT [2] CATGCTAG [3] CCTGATCTCGAC how can I do this in…
Cina
  • 9,759
  • 4
  • 20
  • 36
-1
votes
1 answer

Perl Program error

I wrote a PERL program which takes an excel sheet (coverted to a text file by changing the extension from .xls to .txt) and a sequence file for its input. The excel sheet contains the start point and the end point of an area in the sequence file…
The Last Word
  • 203
  • 1
  • 7
  • 24
-2
votes
3 answers

I receive to many DNA objects in my read DNA function

class DnaSeq: def __init__(self, accession, seq): self.accession = accession self.seq = seq def __len__(self): if self.seq == None: raise ValueError elif self.seq =='': …
-2
votes
1 answer

Transcribe a function and remove blank characters (-)

Write a function make_vax, which takes a list of arbitrary number of dna sequences, and both removes gaps ("-" characters) and transcribes the sequence to mRNA. It returns a list with gap-less mRNA sequences of the same length as the input list. def…
-2
votes
1 answer

Calculate the mean of amino acids sequences biopython

I have this code for calculating the length of sequences in fasta format using BioPython. I got the lenghts. NP_418305.1 349 NP_418306.1 469 NP_418308.1 236 However, now I'd like to calcule the mean of the whole sequences, just like an intereting…
-2
votes
2 answers

using for loop to replace bad nucleotides from DNA sequence

I have a list of sequences (for simplicity like the following one) seqList=["ACCTGCCSSSTTTCCT","ACCTGCCFFFTTTCCT"] and I want to use for looping to replace every instance of a nucleotide other than ["A","C","G","T"] with "N" my code so…
-2
votes
1 answer

CS50 pset 6 DNA works with small.csv but not large.csv

This is my code for the problem set week 6 DNA. When I test with the small.csv it works correctly but when testing with the large.csv it seems to incorrectly count the repeating sequence. Can anyone help me find the error in my code? I am very new…
-2
votes
1 answer

Python finding the longest ORF

Can someone show me a straightforward solution for how to calculate the longest open reading frame (ORF) > 30bp in length in a DNA sequence? ATG is the start codon (i.e., the beginning of an ORF) and TAG, TGA, and TAA are stop codons (i.e., the end…
-2
votes
1 answer

Pairwise sequence analysis - finding indexes of unique combinations

I have a large list of DNA sequences {A,C,T,G} (total of 100,000 lists, each with 3000 characters). I need to analyse these lists in pairs, starting with the 1st list and comparing it with the 2nd, 3rd, 4th, ..., 100,000th. Then move on to the 2nd…
Sudaraka
  • 125
  • 7
-2
votes
1 answer

sorting and counting codons using bash and grep -c

I have a text file which has several lines of codons each line has a set of three nucleotide sequence , it can be either an A,T,G,C but only three of them in a line. (eg. ATC) now, I want to write a while loop that can read these lines and count…
-2
votes
1 answer

Python to convert RNA seq into single-letter Amino Acid sequence

I need some assistance in writing a code that will convert a given RNA nucleotide sequence into an Amino Acid sequence. I've currently been given 2 dictionaries to use: one of Amino Acid codons and their respective 3-letter codes, and one of the…
-2
votes
1 answer

Find sequences that does not match to a target sequence

An interesting question by Rnaer from Biostar: I want to find unique dna/protein sequences of a given length (30nt, for example) that does not match to any region of the C.elegans genome. Is there any tool to do that?
hello_there_andy
  • 2,039
  • 2
  • 21
  • 51
-3
votes
1 answer

DNA Sequence Python not printing

import random def pair(): base = random.choice('AGCT') if base == 'A': base = base + 'G' elif base == 'G': base = 'A' + base elif base == 'C': base = base + 'T' else: base = 'C' + base return…
Nicole
  • 1
  • 1
-3
votes
1 answer

Key error in dna complement

import string import os,sys file=open("C:\Python27\\New Text Document.txt",'r')\ seq =file.readlines() basecomplement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} def translate(seq): aaseq = [] for str in seq: …
1 2 3
31
32