Questions tagged [dna-sequence]

A string representing the nucleotide sequence of the deoxyribonucleic acid, the molecule that holds the genes that constitute the genetic code.

Deoxyribonucleic acid (DNA) contains the genetic instructions specifying the biological development of all cellular life. DNA consists of two long polymers of simple units called nucleotides.

DNA single chain sequences are commonly represented as a string of uppercase letters that correspond to the nucleotide units in the sequence (A, G, C, T). More seldom, ambiquity codes are also used to specify that several alternative nucleotides are possible in the given position (R - A or G, Y - C or T, see complete table.

A great amount of work in bioinformatics is related with the analysis and comparison of these strings. DNA sequences may be very long or they sets may get very large (gigabytes).

Related tags:

475 questions
0
votes
2 answers

Split a list item in python at user defined index

I have a list called: FirstSequenceToSplit and it contains one item, which is a DNA sequence say: 'ATTTTACGTA' I can return the length of this item easily, so the user knows that it is 10 characters long, and what I then want to do is for the user…
PaulBarr
  • 919
  • 6
  • 19
  • 33
0
votes
1 answer

What is the typical size of the sequence files while conducting pairwise sequence alignments?

What is the typical size of the sequence files while conducting pairwise sequence alignments? Can we align the whole genome of organisms?
0
votes
1 answer

DNA pairwise distances from R matrix

When working with DNA, we often need the triangular p-distance matrix, which contains the proportion of non-identical sites between pairs of sequences. Thus: AGGTT AGCTA AGGTA Yields: 1 2 2 0.4 3 0.2 0.2 The p-distance calculation…
evozoa
  • 3
  • 1
0
votes
1 answer

Python regex : overlapping sequences position

I use Python 2.7 and the regex module. I use this expression to find a short sequence in a longer DNA sequence: output = regex.findall(r'(?:'+probe+'){s<'+str(int(mismatches)+1)+'}', sequence, regex.BESTMATCH) The parameters are : probe : a short…
WhyOhWhy
  • 5
  • 5
0
votes
2 answers

How to get the low and high counts of characters from a string?

So I am having trouble with the second part of this project. I have the below code which gives counts for each entry, but I do not know how to get the highs and lows...Thanks in advance! A1Adept This program should process the input as A1Novice…
user3212766
  • 5
  • 1
  • 3
0
votes
4 answers

Using next() in the parameter of if/else statement

So I am pretty sure I am using next and hasNext incorrectly... I am attempting to input a string of ACGT characters and then count the individual resulting letters. Thanks in advance. import java.util.Scanner; public class A1Novice { public…
user3212766
  • 5
  • 1
  • 3
0
votes
0 answers

Visualize DNA sequence on JSP Struct Web Page

We have to visualize the DNA sequence Alignment slimier to blast visualizer like below >our project is a web based one having Java back-end with JSP,Struct Currently needed way to…
0
votes
3 answers

Comparing strings to a dictionary in groups of multiples of 3

I am writing a program which reads in a number of DNA characters (which is always divisible by 3) and checks if they correspond to the same amino acid. For example AAT and AAC both correspond to N so my program should print "It's the same". It does…
NoviceProgrammer
  • 257
  • 1
  • 8
  • 15
0
votes
1 answer

Repeatedly Accessing LARGE fasta files. Most mem efficifent method?

I'm using Biopython to open a large single entry fasta file (514 mega bases) so I can pull out the DNA sequence from specific coordinates. It's reasonably slow to return the sequence and I'm just wondering if there's a faster way to perform this…
user1995839
  • 737
  • 2
  • 8
  • 19
0
votes
1 answer

ORF and amino identification using BioPython's translate() method-- incorrect translations?

I am trying to teach myself bioinformatics, arriving to the party by way of computer science and high performance computing. (Essentially, I'm trying to learn the biology.) I've recently discovered BioPython and so far think it's great, but I am…
HodorTheCoder
  • 254
  • 2
  • 11
0
votes
3 answers

find a DNA barcode with mismatches in sequence

I have 36-nt reads like this: atcttgttcaatggccgatcXXXXgtcgacaatcaa in the fastq file with XXXX being the different barcodes. I want to search for a barcode in the file at exact position(21 to 24) and print the sequences with up to 3 mismatches in…
abh
  • 101
  • 1
  • 4
  • 13
0
votes
1 answer

Python counting nucleotides with a for loop

I'm trying to take DNA sequences from an input file and count the number of individual A's T's C's and G's using a loop to count them and if there is a non "ATCG" letter I need to print "error" For example my input file is: Seq1 AAAGCGT Seq2 …
0
votes
3 answers

regex python Fasta

Thank you for your previous advices, I have another regex problem: now I have a list with this pattern: *7 3 279 0 *33 2 254 0.0233918128654971 *39 2 276 0.027431421446384 and a file with DNA sequencing in Fasta format: EDIT reformated lines…
0
votes
2 answers

Repeating a function over multiple elements in a list

I have written this code import sys file = open(sys.argv[1], 'r') string = '' for line in file: if line.startswith(">"): pass else: string = string + line.strip() #print (list(string)) w = input("Please enter window…
begin.py
  • 161
  • 1
  • 1
  • 9
0
votes
7 answers

Determining the ratio of matches to non-matches of 2 primary strands?

Possible Duplicate: How to plot a gene graph for a DNA sequence say ATGCCGCTGCGC? Im trying to write a Perl script that compares two DNA sequences (60 characters in length each lets say) in alignment, and then show the ratio of matches to…
Conor C
  • 5
  • 3