Questions tagged [dna-sequence]

A string representing the nucleotide sequence of the deoxyribonucleic acid, the molecule that holds the genes that constitute the genetic code.

Deoxyribonucleic acid (DNA) contains the genetic instructions specifying the biological development of all cellular life. DNA consists of two long polymers of simple units called nucleotides.

DNA single chain sequences are commonly represented as a string of uppercase letters that correspond to the nucleotide units in the sequence (A, G, C, T). More seldom, ambiquity codes are also used to specify that several alternative nucleotides are possible in the given position (R - A or G, Y - C or T, see complete table.

A great amount of work in bioinformatics is related with the analysis and comparison of these strings. DNA sequences may be very long or they sets may get very large (gigabytes).

Related tags:

475 questions
3
votes
2 answers

Python GC Counter - Rosalind

I'm attempting to write a programme that will calculate the GC content in each of a series of sequences (input in fasta format) and then return the name of the sequence with the highest percentage and its GC percentage. As per this Rosalind problem.…
Carey
  • 13
  • 1
  • 3
3
votes
2 answers

Python code to find coding DNA with start and stop codons

So recently I've been trying to write a program that detects and cuts out the coding part of a DNA sequence based on start and stop codons. The eventual goal is to compare 2 sequences of both 240 nucleotides long, however one causes sickle-cell…
KRAD
  • 51
  • 2
  • 10
3
votes
1 answer

Plink Error while converting to binary: Line 1 of .ped file has fewer tokens than expected

Can I get some help here? Has anyone experienced the following error in plink (Whole genome association analysis toolset) while converting from 'ped','map' format to the binary counterpart 'bed','bim','fam'? I am using Linux and plink…
3
votes
1 answer

R- How to plot correct pie charts in haploNet haplotyp Networks {pegas} {ape} {adegenet}

When using the haploNet package to make some plots on a haplotype network, I used a script available on the internet to do so. However I think there is something wrong. The script is available in form of the woodmouse example. The code I used is: x…
3
votes
2 answers

Search a sequence in a string. DNA

I need to do a program that separate from 3 to the size of a string and compare to the others sequences of 3 in the same string given. I'm going to explain it. User introduce this DNA string = "ACTGCGACGGTACGCTTCGACGTAG" For example. We start with…
thecatbehindthemask
  • 413
  • 1
  • 6
  • 15
3
votes
1 answer

How to plot Pie charts in haploNet Haplotype Networks {pegas}

I'm trying to use haploNet function of {pegas} to plot a haplotype network, but i`m having trouble putting equal haplotypes from different populations in a same piechart. I can build a haplotype net with the following script: x <-…
3
votes
1 answer

Counting DNA sequences with python/biopython

My script below is counting the occurrences of the sequences 'CCCCAAAA' and 'GGGGTTTT' from a standard FASTA file: >contig00001 CCCCAAAACCCCAAAACCCCAAAACCCCTAcGAaTCCCcTCATAATTGAAAGACTTAAACTTTAAAACCCTAGAAT The script counts the CCCCAAAA sequence…
sheaph
  • 199
  • 1
  • 2
  • 10
3
votes
3 answers

Read a text file into python by splitting the file into list items according to a set of characters

I have a plain text file with the following contents: @M00964: XXXXX YYY + ZZZZ @M00964: XXXXX YYY + ZZZZ @M00964: XXXXX YYY + ZZZZ and I would like to read this into a list split into items according to the ID code @M00964, i.e. : ['@M00964:…
PaulBarr
  • 919
  • 6
  • 19
  • 33
3
votes
1 answer

Bio.Phylo.PAML.codeml's results parser quietly fails to read all the data

Biopython comes with methods to interface with the PAML package for phylogenetic analysis. In particular I am using Bio.Phylo.PAML to run analyses using PAML's codeml.exe program which in my case does Ka/Ks (dN/dS) ratio analysis on pairs of…
hello_there_andy
  • 2,039
  • 2
  • 21
  • 51
3
votes
1 answer

Complexity of computing the similarity between two sequences

What is the computational complexity of the best known algorithm for computing the similarity between two sequences (as in DNA or Protein alignment/approximate string matching)? The similarity is based on: scoring the alignment using substitution…
alex
  • 1,757
  • 4
  • 21
  • 32
2
votes
3 answers

Obtain DNA substrings wrt their original orders

I'd like to get the substrings of long DNA sequences For example, given: 1/ATXGAAATTXXGGAAGGGGTGG 2/AATXGAAGGAAGGAAGGGGATATTX 3/AAAAAATTXXGGAAGGGGXTTTA 4/AAAATTXXATAXXGGAAGGGGXTXG 5/ATTATTGTTXAXTATTT the output is to be: 1/TXG - TTXX 2/TXG …
Baby Dolphin
  • 489
  • 4
  • 13
2
votes
1 answer

How to create a barcode from a DNA sequence using Python and PIL library?

i am a beginner and want to make a barcode out of this DNA sequence by using pyhton code. it's supposed to read each 1024 nucleotide and checks for mers (a combination of 4 nucleotides i.g. AAAA, AAAC, AAAG, AAAT ..... TTTT). each mer holds an index…
user21934949
2
votes
1 answer

Syntax conflict for "{" using Nextflow

New to nextflow, attempted to run a loop in nextflow chunk to remove extension from sequence file names and am running into a syntax error. params.rename = "sequences/*.fastq.gz" workflow { rename_ch =…
2
votes
1 answer

Calculate percentage divergence between two genetic sequences in R

I haven't been able to find this in the questions or an R package, hopefully straightforward. Take two hypothetical genetic sequences: Sequence A: ATG CGC AAC GTG GAG CAT Sequence B: ATG GGC TAC GTG GAT CAA I want to have R code to generate the…
Nick Crouch
  • 301
  • 3
  • 14
2
votes
2 answers

How to do multiple sequence alignment of text strings (utf8) in R

Given three strings: seq <- c("abcd", "bcde", "cdef", "af", "cdghi") I would like to do multiple sequence alignment so that I get the following result: abcd bcde cdef a f cd ghi Using the msa() function from the msa package I…
WJH
  • 539
  • 5
  • 14