Questions tagged [bioinformatics]

For programming-related questions related to Bioinformatics. Other questions do not belong here, but might be on-topic at https://bioinformatics.stackexchange.com/.

Bioinformatics is an interdisciplinary scientific field that develops methods and software tools for understanding biological data. Bioinformatics combines computer science, statistics, mathematics, and engineering to study and process various types of biological data.

There is a former Stack Exchange site specific to bioinformatics at Biostars and a new Stack Exchange site dedicated to bioinformatics

4320 questions
108
votes
11 answers

How much storage would be required to store a human genome?

I'm looking for the amount of storage in bytes (MB, GB, TB, etc.) required to store a single human genome. I read a few articles on Wikipedia about DNA, chromosomes, base pairs, genes, and have some rough guess, but before disclosing anything I'd…
Milan Babuškov
  • 59,775
  • 49
  • 126
  • 179
58
votes
11 answers

How to call module written with argparse in iPython notebook

I am trying to pass BioPython sequences to Ilya Stepanov's implementation of Ukkonen's suffix tree algorithm in iPython's notebook environment. I am stumbling on the argparse component. I have never had to deal directly with argparse before. How…
Niels
  • 1,513
  • 1
  • 14
  • 21
37
votes
6 answers

How to plot a gene graph for a DNA sequence say ATGCCGCTGCGC?

I need to generate a random walk based on the DNA sequence of a virus, given its base pair sequence of 2k base pairs. The sequence looks like "ATGCGTCGTAACGT". The path should turn right for an A, left for a T, go upwards for a G and downwards for a…
35
votes
5 answers

WinError 2 The system cannot find the file specified (Python)

I have a Fortran program and want to execute it in python for multiple files. I have 2000 input files but in my Fortran code I am able to run only one file at a time. How should I call the Fortran program in python? My Script: import…
Jone
  • 421
  • 1
  • 5
  • 9
33
votes
11 answers

Finding matching keys in two large dictionaries and doing it fast

I am trying to find corresponding keys in two different dictionaries. Each has about 600k entries. Say for example: myRDP = { 'Actinobacter': 'GATCGA...TCA', 'subtilus sp.': 'ATCGATT...ACT' } myNames = { 'Actinobacter': '8924342' } I want…
Austin Richardson
  • 8,078
  • 13
  • 43
  • 49
28
votes
9 answers

Clojure or Scala for bioinformatics/biostatistics/medical research

I am not a professional programmer (my area is medical research), but I am quite capable in C/C++, and various scripting languages. A while back I got intrigued by Lisp, but I never got the time to seriously learn it. After a brief exposure to R I…
kliron
  • 4,383
  • 4
  • 31
  • 47
28
votes
12 answers

Why is Perl used so extensively in biology research?

I work as support staff in a biology research institute as a student, and Perl seems to be used everywhere. Not for every single project, but it seems that more than half the people here have a few Perl books in/on their office/desk. Why is Perl…
Kevin
  • 2,361
  • 2
  • 20
  • 20
27
votes
4 answers

Save complete web page (incl css, images) using python/selenium

I am using Python/Selenium to submit genetic sequences to an online database, and want to save the full page of results I get back. Below is the code that gets me to the results I want: from selenium import webdriver URL =…
Max Power
  • 8,265
  • 13
  • 50
  • 91
27
votes
0 answers

Differential gene expression analysis in Python

It seems that most differential gene expression packages for RNA-Seq are written in R. Examples include: - edgeR - limma - DESeq Are any similar (and easy to use) packages available for Python, or have any of the R packages been ported? The best I…
ljc
  • 943
  • 2
  • 10
  • 26
26
votes
3 answers

How can I convert Ensembl ID to gene symbol in R?

I have a data.frame containing Ensembl IDs in one column; I would like to find corresponding gene symbols for the values of that column and add them to a new column in my data frame. I used bioMaRt but It couldn't find any of the Ensembl IDs! Here…
user3576287
  • 932
  • 3
  • 16
  • 30
26
votes
15 answers

Encouraging good development practices for non-professional programmers?

In my copious free time, I collaborate with a number of scientists (mostly biologists) who develop software, databases, and other tools related to the work they do. Generally these projects are built on a one-off basis, used in-house, and eventually…
Meredith L. Patterson
  • 4,853
  • 29
  • 30
23
votes
2 answers

How to remove rows with 0 values using R

Hi am using a matrix of gene expression, frag counts to calculate differentially expressed genes. I would like to know how to remove the rows which have values as 0. Then my data set will be compact and less spurious results will be given for the…
ivivek_ngs
  • 917
  • 3
  • 10
  • 28
22
votes
6 answers

Inverse of Hamming Distance

*This is a brief introduction, the specific question is in bold at the last paragraph. I'm trying to generate all strings with a given Hamming Distance to solve efficiently a bioinformatic assignment. The idea is, given a string (ie.…
JackS
  • 423
  • 3
  • 12
22
votes
1 answer

Why is some code Deterministic in Python2 and Non-Deterministic in Python 3?

I'm trying to write a script to calculate all of the possible fuzzy string match matches to for a short string, or 'kmer', and the same code that works in Python 2.7.X gives me a non-deterministic answer with Python 3.3.X, and I can't figure out…
21
votes
4 answers

Using the reserved word "class" as field name in Django and Django REST Framework

Description of the problem Taxonomy is the science of defining and naming groups of biological organisms on the basis of shared characteristics. Organisms are grouped together into taxa (singular: taxon) and these groups are given a taxonomic rank.…
cezar
  • 11,616
  • 6
  • 48
  • 84
1
2 3
99 100