Questions tagged [dna-sequence]

A string representing the nucleotide sequence of the deoxyribonucleic acid, the molecule that holds the genes that constitute the genetic code.

Deoxyribonucleic acid (DNA) contains the genetic instructions specifying the biological development of all cellular life. DNA consists of two long polymers of simple units called nucleotides.

DNA single chain sequences are commonly represented as a string of uppercase letters that correspond to the nucleotide units in the sequence (A, G, C, T). More seldom, ambiquity codes are also used to specify that several alternative nucleotides are possible in the given position (R - A or G, Y - C or T, see complete table.

A great amount of work in bioinformatics is related with the analysis and comparison of these strings. DNA sequences may be very long or they sets may get very large (gigabytes).

Related tags:

levenshtein-distance

475 questions

votes

2 answers

Python GC Counter - Rosalind

I'm attempting to write a programme that will calculate the GC content in each of a series of sequences (input in fasta format) and then return the name of the sequence with the highest percentage and its GC percentage. As per this Rosalind problem.…

python bioinformatics dna-sequence

asked Jan 31 '16 at 18:58

Carey

votes

2 answers

Python code to find coding DNA with start and stop codons

So recently I've been trying to write a program that detects and cuts out the coding part of a DNA sequence based on start and stop codons. The eventual goal is to compare 2 sequences of both 240 nucleotides long, however one causes sickle-cell…

python python-3.x dna-sequence

asked Nov 30 '15 at 22:17

KRAD

votes

1 answer

Plink Error while converting to binary: Line 1 of .ped file has fewer tokens than expected

Can I get some help here? Has anyone experienced the following error in plink (Whole genome association analysis toolset) while converting from 'ped','map' format to the binary counterpart 'bed','bim','fam'? I am using Linux and plink…

python dna-sequence genome

asked Jul 06 '15 at 15:02

Daniel Fernandes

votes

1 answer

R- How to plot correct pie charts in haploNet haplotyp Networks {pegas} {ape} {adegenet}

When using the haploNet package to make some plots on a haplotype network, I used a script available on the internet to do so. However I think there is something wrong. The script is available in form of the woodmouse example. The code I used is: x…

r dna-sequence phylogeny genetics

asked Jul 04 '15 at 12:24

Arn De Grauwe

votes

2 answers

Search a sequence in a string. DNA

I need to do a program that separate from 3 to the size of a string and compare to the others sequences of 3 in the same string given. I'm going to explain it. User introduce this DNA string = "ACTGCGACGGTACGCTTCGACGTAG" For example. We start with…

c++ string algorithm search dna-sequence

asked Mar 25 '15 at 13:08

thecatbehindthemask

votes

1 answer

How to plot Pie charts in haploNet Haplotype Networks {pegas}

I'm trying to use haploNet function of {pegas} to plot a haplotype network, but i`m having trouble putting equal haplotypes from different populations in a same piechart. I can build a haplotype net with the following script: x <-…

r dna-sequence phylogeny genetics ape-phylo

asked Sep 10 '14 at 01:20

Guilherme De Rezende Dias

votes

1 answer

Counting DNA sequences with python/biopython

My script below is counting the occurrences of the sequences 'CCCCAAAA' and 'GGGGTTTT' from a standard FASTA file: >contig00001 CCCCAAAACCCCAAAACCCCAAAACCCCTAcGAaTCCCcTCATAATTGAAAGACTTAAACTTTAAAACCCTAGAAT The script counts the CCCCAAAA sequence…

python bioinformatics biopython dna-sequence

asked Apr 01 '14 at 15:47

sheaph

votes

3 answers

Read a text file into python by splitting the file into list items according to a set of characters

I have a plain text file with the following contents: @M00964: XXXXX YYY + ZZZZ @M00964: XXXXX YYY + ZZZZ @M00964: XXXXX YYY + ZZZZ and I would like to read this into a list split into items according to the ID code @M00964, i.e. : ['@M00964:…

python list readfile splice dna-sequence

asked Mar 25 '14 at 15:17

PaulBarr

votes

1 answer

Bio.Phylo.PAML.codeml's results parser quietly fails to read all the data

Biopython comes with methods to interface with the PAML package for phylogenetic analysis. In particular I am using Bio.Phylo.PAML to run analyses using PAML's codeml.exe program which in my case does Ka/Ks (dN/dS) ratio analysis on pairs of…

parsing python-2.7 bioinformatics biopython dna-sequence

asked Jan 01 '14 at 19:37

hello_there_andy

2,039
2
21
51

votes

1 answer

Complexity of computing the similarity between two sequences

What is the computational complexity of the best known algorithm for computing the similarity between two sequences (as in DNA or Protein alignment/approximate string matching)? The similarity is based on: scoring the alignment using substitution…

algorithm complexity-theory bioinformatics dna-sequence

asked Feb 09 '13 at 03:01

alex

1,757
4
21
32

votes

3 answers

Obtain DNA substrings wrt their original orders

I'd like to get the substrings of long DNA sequences For example, given: 1/ATXGAAATTXXGGAAGGGGTGG 2/AATXGAAGGAAGGAAGGGGATATTX 3/AAAAAATTXXGGAAGGGGXTTTA 4/AAAATTXXATAXXGGAAGGGGXTXG 5/ATTATTGTTXAXTATTT the output is to be: 1/TXG - TTXX 2/TXG …

c# regex perl dna-sequence

asked Nov 29 '11 at 12:21

Baby Dolphin

votes

1 answer

How to create a barcode from a DNA sequence using Python and PIL library?

i am a beginner and want to make a barcode out of this DNA sequence by using pyhton code. it's supposed to read each 1024 nucleotide and checks for mers (a combination of 4 nucleotides i.g. AAAA, AAAC, AAAG, AAAT ..... TTTT). each mer holds an index…

python python-imaging-library bioinformatics dna-sequence

asked May 23 '23 at 17:09

user21934949

votes

1 answer

Syntax conflict for "{" using Nextflow

New to nextflow, attempted to run a loop in nextflow chunk to remove extension from sequence file names and am running into a syntax error. params.rename = "sequences/*.fastq.gz" workflow { rename_ch =…

bioinformatics bioconductor dna-sequence fastq nextflow

asked Nov 18 '22 at 04:54

Michael Yoon

votes

1 answer

Calculate percentage divergence between two genetic sequences in R

I haven't been able to find this in the questions or an R package, hopefully straightforward. Take two hypothetical genetic sequences: Sequence A: ATG CGC AAC GTG GAG CAT Sequence B: ATG GGC TAC GTG GAT CAA I want to have R code to generate the…

r dna-sequence

asked Sep 03 '11 at 15:41

Nick Crouch

votes

2 answers

How to do multiple sequence alignment of text strings (utf8) in R

Given three strings: seq <- c("abcd", "bcde", "cdef", "af", "cdghi") I would like to do multiple sequence alignment so that I get the following result: abcd bcde cdef a f cd ghi Using the msa() function from the msa package I…

r dna-sequence sequence-alignment string-algorithm

asked May 25 '22 at 11:52

WJH

Prev 1 2 3

…

31 32 Next