Questions tagged [dna-sequence]

A string representing the nucleotide sequence of the deoxyribonucleic acid, the molecule that holds the genes that constitute the genetic code.

Deoxyribonucleic acid (DNA) contains the genetic instructions specifying the biological development of all cellular life. DNA consists of two long polymers of simple units called nucleotides.

DNA single chain sequences are commonly represented as a string of uppercase letters that correspond to the nucleotide units in the sequence (A, G, C, T). More seldom, ambiquity codes are also used to specify that several alternative nucleotides are possible in the given position (R - A or G, Y - C or T, see complete table.

A great amount of work in bioinformatics is related with the analysis and comparison of these strings. DNA sequences may be very long or they sets may get very large (gigabytes).

Related tags:

475 questions
0
votes
1 answer

Picard SamToFastq only extracts one read, then throws an error

I'm trying to extract some FastQ files from bam files. Picard can do this with SamToFastq as it says in the documentation for this tool it accepts either a bam or sam file. But when I run it, it only extracts one read, and then exits. Here is the…
Davy Kavanagh
  • 4,809
  • 9
  • 35
  • 50
-1
votes
3 answers

Finding the complement of a DNA sequence

I have to translate the complement of a DNA sequence into amino acids TTTCAATACTAGCATGACCAAAGTGGGAACCCCCTTACGTAGCATGACCCATATATATATATATA TATATATATATATATGGGTCATGCTACGTAAGGGGGTTCCCACTTTGGTCATGCTAGTATTGAAA +1…
-1
votes
1 answer

Trim first and last bases in fasta file

I have a circular mtDNA reference. After the alignment I would like to cut first and last 15 bases in my fasta files. How can I do that? For example, this is my sequence and I need to take out first and last 15 letters. The first 15 characters will…
Anna
  • 53
  • 6
-1
votes
1 answer

Trying to create a sliding window that checks for repeats in a DNA sequence

I'm trying to write a bioinformatics code that will check for certain repeats in a given string of nucleotides. The user inputs a certain patter, and the program outputs how many times something is repeated, or even highlights where they are. I've…
-1
votes
2 answers

How to save the swalign library output (Local Alignment - Smith-Waterman Algorithm)?

I have used the below code to get the local alignment score between two strings using Smith-Waterman Algorithm. However, I'm getting the required output, I'm finding it difficult to save the result into some variable for further analysis. import…
-1
votes
4 answers

Need some help on a function

Write a function named one_frame that takes one argument seq and performs the tasks specified below. The argument seq is to be a string that contains information for the bases of a DNA sequence. a → The function searches given DNA string from left…
-1
votes
1 answer

BWA-mem and sambamba read group line error

This is a two-part question: help interpreting an error; help with coding. I'm trying to run bwa-mem and sambamba to aling raw reads to a reference genome and to sort by position. These are the commands I'm using: bwa mem \ -K 100000000 -v 3…
-1
votes
3 answers

easy way to extract uppercase in string in R

I am beginner programmer in R. I have "cCt/cGt" and I want to extract C and G and write it like C>G. test ="cCt/cGt" str_extract(test, "[A-Z]+$")
jean simon
  • 11
  • 1
  • 2
-1
votes
1 answer

Counting bases in a sequence - Nonetype

I am trying to create a function where the user is able to input a file name (containing a DNA sequence), and the respective number of bases present in the selected file are counted and output onto the screen in the order: #A, #G, #C, #T. I then…
heather_l
  • 11
  • 4
-1
votes
2 answers

revome lines from .txt in python

I have the following .txt: TITLE Genetic variation in the complete MgPa operon and its repetitive chromosomal elements in clinical strains of Mycoplasma genitalium JOURNAL PLoS ONE 5 (12), E15660 (2010) PUBMED 21187921 REMARK Publication…
-1
votes
1 answer

Finding substrings in a DNA sequence; script returns higher values than expected

I'm struggling with a really frustrating problem, I've spent the past 2.5 hours trying to find the bug, but I can't manage. The problem is this: I have to find the amount of occurrences of each combination of 4 DNA nucleotides (AAAA-TTTT) in a…
-1
votes
1 answer

How to find Mutations for a reverse oriented gene(like pncA) from TB sequencing fasta file using biopython library in Python3?

To find a mutation like for S104R(from 2288681 to 2289241 for pyrazinamide), we have to first remove '-'(for stripping insertion/deletions if/any present in fasta file), then take reverse complement of it and then look for the particular mutation…
-1
votes
1 answer

Implementing Smith-Waterman algorithm for local alignment in python

I have created a sequence alignment tool to compare two strands of DNA (X and Y) to find the best alignment of substrings from X and Y. The algorithm is summarized here (https://en.wikipedia.org/wiki/Smith–Waterman_algorithm). I have been able to…
-1
votes
1 answer

Complementary DNA(C++)

Task:Write a code to the new string of Dna According to its pattern. Just so you know In DNA strings, symbols "A" and "T" are complements of each other, as "C" and "G". Fore example:DNA_strand ("ATTGC") //returns "TAACG" or DNA_strand ("GTAT")…
-1
votes
2 answers

Need to count how many times "AGAT" "AATG" and "TATC" repeats in .txt file that has a DNA sequence

This is my first coding class and I'm having trouble getting the counter to increase every time one of the given appears in the DNA sequence. My code so far: agat_Counter = 0 aatg_Counter= 0 tatc_Counter= 0 DNAsample = open('DNA SEQUENCE FILE.txt',…