Questions tagged [dna-sequence]

A string representing the nucleotide sequence of the deoxyribonucleic acid, the molecule that holds the genes that constitute the genetic code.

Deoxyribonucleic acid (DNA) contains the genetic instructions specifying the biological development of all cellular life. DNA consists of two long polymers of simple units called nucleotides.

DNA single chain sequences are commonly represented as a string of uppercase letters that correspond to the nucleotide units in the sequence (A, G, C, T). More seldom, ambiquity codes are also used to specify that several alternative nucleotides are possible in the given position (R - A or G, Y - C or T, see complete table.

A great amount of work in bioinformatics is related with the analysis and comparison of these strings. DNA sequences may be very long or they sets may get very large (gigabytes).

Related tags:

475 questions
3
votes
1 answer

Need help reading in from information from a file

So I have chains of DNA letters (A,G,T,C) in linked list, and am supposed to read in from a file that looks like this: I[tab] ATT\n I[tab] ATC\n (etc) L CTA L CTG V GTA V GTG F TTT F TTC .. where the single letters is what you get…
3
votes
2 answers

Extracting multiple sequences from a 7 GB fasta file using IDs from a separate text file

After searching this website for hours and trying many different things that didn't work, I have decided to post my own question. I currently have a text file (id.txt) that contains about 100 lines of the following IDS in this…
Emily W
  • 31
  • 2
3
votes
1 answer

Solution needed for identifying partially matching strings (DNA sequences) in a data.frame with many rows

I am looking for a solution for the following problem: I do have a dataframe with over 6 million rows which contains sequencing information (DNA sequence) in one row. Based on the way the dataset was reported, there will duplicated rows in the…
Ben
  • 99
  • 6
3
votes
1 answer

Find longest palindrome substring of a piece of DNA

I have to make a function that prints the longest palindrome substring of a piece of DNA. I already wrote a function that checks whether a piece of DNA is a palindrome itself. See the function below. def make_complement_strand(DNA): …
Lynn
  • 35
  • 6
3
votes
1 answer

How to count longest sequence of recurring characters in very long string inside text file in Python

Hello my question is how to count the longest sequence of AGATC (meaning AGATCAGATCis sequence of two) in the very long string. I need to use python. My code for now looks like this: pattern = re.compile(r'AGATC') matches =…
3
votes
2 answers

How to print the key of a dictionary, searching for a value in the list of values

I know this is a duplicate. However, the answer to that particular question does not work for me. Link to the question here. How do you print the key of a dictionary from the dictionary's value? My code: xdict = { "Phenylalanine": ["UUU", "UUC"],…
GuyOverThere
  • 89
  • 2
  • 8
3
votes
2 answers

Univocal hash function for a string 76 chars long

Here's my problem (I'm programming in C): I have some huge text files containing DNA sequences (each file has something like 65 million rows and a size of about 4~5 GB). In these files there are many duplicates (don't know how many yet, but there…
Alex
  • 301
  • 5
  • 13
3
votes
1 answer

DNA to RNA and Getting Proteins with Perl

I am working on a project(I have to implement it in Perl but I am not good at it) that reads DNA and finds its RNA. Divide that RNA's into triplets to get the equivalent protein name of it. I will explain the steps: 1) Transcribe the following DNA…
kamaci
  • 72,915
  • 69
  • 228
  • 366
3
votes
2 answers

Generating DNA sequence excluding specific sequence

I've just started to learn programming with python. In class we were asked to generate a random DNA sequence, that does NOT contain a specific 6-letter sequence (AACGTT). The point is to make a funtion that always return a legal sequence. Currently…
3
votes
1 answer

RNA Splicing Python

I have a gene sequence…
Ibtihaj Tahir
  • 636
  • 5
  • 17
3
votes
2 answers

Develop nucleotide sequence

I would like to develop these expressions that are in this form: a <- "[AGAT]5GAT[AGAT]7[AGAC]6AGAT" I would like to convert the expression like this: b <- "AGATAGATAGATAGATAGATGATAGATAGATAGATAGATAGATAGATAGATAGACAGACAGACAGACAGACAGACAGAT" As you…
yach
  • 43
  • 6
3
votes
2 answers

Python; DNA Sequence to AscII Text

My aim is to discover a piece of text hidden through AscII 8bits in a very long (>115,000) sequence of DNA. I've written code to open the file with the DNA in, convert all C's and A's to 0 and all T's and G's to 1. I've then converted this string…
daenwaels
  • 85
  • 1
  • 7
3
votes
3 answers

How do I parallelize this code in python (GC calculator from FASTA)?

System Monitor during process I am a novice when it comes to programming. I've worked through the book Practical Computing for Biologists and am playing around with some slightly more advanced concepts. I've written a Python (2.7) script which reads…
canfiese
  • 61
  • 6
3
votes
2 answers

How to convert a set of DNA sequences into protein sequences using python programming?

I am using python to create a program that converts a set of DNA sequences into amino acid (protein) sequences. I then need to find a specific subsequence, and count the number of sequences in which this specific subsequence is present. This is the…
3
votes
0 answers

C# Needleman-Wunsch Algorithm w/ InDel Extension Debugging

I need some help finding the flaw in my Needleman-Wunsch algorithm implementation. Right now, I'm focusing on one problem at a time, and the problem I face is when calculating my optimal alignment score it's off and I can't seem to pinpoint why. I…
csharpdev
  • 31
  • 3