Questions tagged [dna-sequence]

A string representing the nucleotide sequence of the deoxyribonucleic acid, the molecule that holds the genes that constitute the genetic code.

Deoxyribonucleic acid (DNA) contains the genetic instructions specifying the biological development of all cellular life. DNA consists of two long polymers of simple units called nucleotides.

DNA single chain sequences are commonly represented as a string of uppercase letters that correspond to the nucleotide units in the sequence (A, G, C, T). More seldom, ambiquity codes are also used to specify that several alternative nucleotides are possible in the given position (R - A or G, Y - C or T, see complete table.

A great amount of work in bioinformatics is related with the analysis and comparison of these strings. DNA sequences may be very long or they sets may get very large (gigabytes).

Related tags:

levenshtein-distance

475 questions

vote

0 answers

How do I download a large number of GenBank sequences using entrez_fetch in R?

I am trying to download sequence data from 1283 records in GenBank using rentrez. I'm using the following code, first to search for records fitting my criteria, then linking across databases, and finally fetching the sequence data: # Search for…

r bioinformatics dna-sequence ncbi rentrez

asked Jan 27 '23 at 17:49

notasfarwest

vote

1 answer

Rentrez is pulling the wrong data from NCBI in R?

I am trying to download sequence data from E. coli samples within the state of Washington - it's about 1283 sequences, which I know is a lot. The problem that I am running into is that entrez_search and/or entrez_fetch seem to be pulling the wrong…

r bioinformatics dna-sequence rentrez

asked Jan 27 '23 at 04:12

notasfarwest

vote

2 answers

Mark positions of a string in a list

I have two lists, one holds nucleotide values nucleotides = ['A', 'G', 'C', 'T', 'A', 'G', 'G', 'A', 'G', 'C'] second one holds true(1) or false(0) values for every letter to indicate that they are covered or not. flag_map = [0, 0, 0, 0, 0, 0, 0,…

python string list dna-sequence

asked Jan 03 '23 at 23:31

neodeep

vote

2 answers

Read Clustal file in Python

I have a multiple sequence alignment (MSA) file derived from mafft in clustal format which I want to import into Python and save into a PDF file. I need to import the file and then highlight some specific words. I've tried to simply import the pdf…

python alignment sequence biopython dna-sequence

asked Dec 19 '22 at 15:00

Denise Lavezzari

vote

3 answers

Create a new variable instance each time I split a string in Python

I have a string into a variable x that includes ">" symbols. I would like to create a new variable each time the string is splitted at the ">" symbol. The string I have in the variable x is as such (imported from a simple .txt…

python split bioinformatics fasta dna-sequence

asked Dec 19 '22 at 14:45

d.cio

vote

2 answers

how do I write a algorithm to find genes in a large String

I'm writing a program to find genes in a large string of DNA. My output is correct on small input strings of DNA, but when I test it on their example DNA string (which is very large—too large to check manually if my output is correct) it says that…

java algorithm dna-sequence

asked Dec 01 '22 at 22:04

isaiah paget

vote

1 answer

Automated introduction of mutation at a specific base in a DNA sequence

I am looking for a way to change A ->T and G ->C and vice versa at the 11th base in a 30-base DNA sequence. I have tried to use the Replace function in Excel but I couldn't work out how to make it conditional i.e. if it is A change it to T and so…

excel vba dna-sequence

asked Nov 11 '22 at 12:12

Saps

vote

3 answers

Simplify and Improve for Multi-If-Statement

I am trying to randomly generate multiple short 5 base-pair DNA sequences. Among them, I want to pick the sequences that meet the following conditions: If the first letter is A then the last letter cannot be T If the first letter is T then the last…

python performance dna-sequence

asked Jul 19 '22 at 21:00

indigo

vote

0 answers

Faster algorithm for lexicographic comparison of DNA strings

I'm trying to find a faster way to do the following: Given a list of DNA strings x = ([s1, s2, s3, s4...]) (where the strings can only consist of the letters 'A', 'T', 'C', and 'G') and a list of index pairs y = ([[i, j], [i, j], [i, j]....]) find a…

python algorithm time-complexity dna-sequence lexicographic

asked Mar 27 '22 at 07:25

nunya

vote

0 answers

Which machine learning methods can I use to predict DNA Sequences?

I have a dataset of DNA Sequences related to Covid-19 and I simply want to predict possible future sequences based on the existing sequences. DNA Sequences are consist of 4 letters and 4 letters only, A,G,T and C. So a chunk of a sequence would look…

machine-learning prediction predict dna-sequence

asked Mar 11 '22 at 06:54

D3WYAN

vote

1 answer

Find the postion of SNP in the gen list

I have SNP data and gen list data. I am looking for the position of SNP cotain in the gen list data when I compare with gen list. For example: The SNP data : Pos_start pos_end 14185 14185 .... ..... The gen list data:…

perl sequence bioinformatics dna-sequence bioperl

asked Aug 19 '11 at 06:22

Phan

vote

1 answer

A PWM with gapped alignments in Biopython

I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multiple sequence alignments. I get a "Wrong Alphabet" error every time I do it with gapped alignments. From reading the documentation, I think I need to utilize…

bioinformatics biopython alphabet dna-sequence

asked Aug 09 '11 at 15:28

RossCampbell

vote

1 answer

How to use lists and loops to count the occurrences of dinucleotide pairs?

I have a DNA text file and I need to specifically use lists and loops to count the occurrences of dinucleotide pairs (ex: AA, AC, AT, AG, CA, CC... etc) then use lists and loops again to print the counts to a new text file as a table with two…

python dna-sequence

asked Sep 27 '21 at 02:22

Banana nana

vote

1 answer

transformation of csv file with dna sequences to fasta format with rstudio and with biostrings

i have a csv file with DNA sequences. The file has 4 columns which are the name of the chromosome, the start and end of the sequence and the strand (missing or +). I want to transorme this file in fasta format with Rstudio and with the tool of…

r csv transform fasta dna-sequence

asked Aug 23 '21 at 18:38

HELEN BARBA

vote

1 answer

Python: build consensus sequence

I want to build a consensus sequence from several sequences in python and I'm looking for the most efficient / most pythonic way to achieve this. I have a list of strings like this: sequences = ["ACTAG", "-TTCG", "CTTAG"] I furthermore have an…

python sequence dna-sequence

asked Jun 04 '21 at 13:13

Thomas Müller

Prev 1 2 3

…

31 32 Next