Questions tagged [biopython]

Biopython is a set of freely available tools for biological computation written in Python. Please only use this tag for issues relating to the Biopython suite of tools.

Biopython is a set of freely available tools for biological computation written in Python. It is developed by The Biopython Project, an international association of developers of Python tools for computational molecular biology. It includes a range of bioinformatics functionalities such as:

  • Parsing bioinformatics files into data structures usable by Python

  • Interfaces to commonly used bioinformatics programs (BLAST, Clustalw, EMBOSS among others)

  • Class for dealing with DNA, RNA and protein sequences. This includes feature annotations.

  • Tools for performing common operations on sequences, such as translation, transcription and weight calculations

amongst many, many others.

The biopython tag

Questions with tag should relate to issues involving the Biopython package of tools.

Learning More

The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research. It also has a useful wiki site.

The Biopython Cookbook provides many examples of Biopython being used as well as installation instructions and a FAQ section.

1345 questions
-1
votes
1 answer

Python: How to get rid of the sequences according to the sequence bases rather than their header name?

I would like to deduct two files based on the sequence constituents rather than using the header name to get rid of the sequences. Is there any other way I can deduct the sequences? can anyone help me? If the fasta header below is replaced with…
Xiong89
  • 767
  • 2
  • 13
  • 24
-1
votes
1 answer

How to copy the Species name from a .fasta file header and add it to the same file name?

I have more than 5000 protein fasta files from different species. The name of each files has a uniport ID (e.g, UP000000212_1234679.fasta). The first line of each file contains the Species name (e.g., >tr|K8E169|K8E169_CARML S4 domain protein YaaA…
Ebi
  • 31
  • 4
-1
votes
1 answer

error in installing python module

I am trying to install a python module called biopython using pip install biopython and setup.py install but getting following error. error: Unable to find vcvarsall.bat
Alph
  • 391
  • 2
  • 7
  • 18
-1
votes
2 answers

How to create a list holding multiple fasta sequences and ids in Python

I am new to Python. I am trying to take genome fasta file containing 8 chromosome sequences as input, blast it against a query sequence and extract the top 50 hits. Hre's my code: from Bio import SeqIO from Bio.Seq import Seq from Bio.Blast import…
RRN
  • 3
  • 3
-1
votes
2 answers

remove uncommon string words in two files

I have two files, file 1 contains 2 columns, file 2 contains 5 columns. I want to remove the lines from file 2 that dont contain common strings with file 1: -file 1, if this is a list, each line contains [0] and [1] gene-3 + gene-2 - gene-1 …
-1
votes
1 answer

Biopython Retrieving protein transcripts for a protein coding gene

I am using biopython's wrapper API for ncbi eutils to retrieve related proteins, identical proteins and variant proteins (transcripts, splice variants, etc) for a certain protein coding gene. This information is displayed for a protein coding gene…
user2764
  • 3
  • 2
-1
votes
1 answer

Can't write to file with print statements

Here I have made a simple program to go through a text file containing a bunch of genes in a bacterial genome, including the amino acids that code for those genes (explicit use is better right?) I am relying heavily on modules in Biopython. This…
Jackie
  • 1
  • 1
-1
votes
1 answer

Making annotated chromosome using biopython

I am trying to recreate the annotated chromosome using biopython (http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec345). I have a test code that would create one chromosome and one annotated feature (5, 10, "1", "Gm18_5133882_G_A", "blue"). …
user690462
  • 129
  • 1
  • 1
  • 9
-1
votes
2 answers

How to make a summation under a condition in Biopython

I have a FASTA file with three defined elements in the "description" line. The first element, defined as dato[0], is the one that has to carry out with the condition and the third element, defined as dato[2], is the one that I want to sum. The FASTA…
Ma_fermar
  • 33
  • 2
  • 10
-1
votes
1 answer

Retrieve EMBL-Bank ID through corresponding Ensembl Gene ID in batch

I got a list of around 5000 genes as a search result from Gene Expression Atlas. From the result page i can download all the result in a file. That file contains gene identifiers(Ensembl Gene ID) for each gene. So now i want corresponding EMBL-Bank…
user1144004
  • 183
  • 3
  • 4
  • 21
-1
votes
2 answers

Installing Biopython in new installation of IPython/Python2.7

I have Python2.7 installed and was having some issues with installing scipy. Through some Googling, I figured from a thread here (installing scipy on mac 10.6.8) that it is better to install scipy using MacPorts and IPython. IPython looked cool…
user1938965
  • 163
  • 2
  • 10
-1
votes
1 answer

Get protein monomers in an automated way

For a given set of protein structures in pdb format from the PDB database, I would like to find some automated way for checking whether each structure is monomer, dimer, trimer, etc, so I only get for each case the unit structure or monomer. I head…
Open the way
  • 26,225
  • 51
  • 142
  • 196
-1
votes
1 answer

inputting and aligning protein sequence

I have a script for finding mutated positions in protein sequence.The following script will do this. import pandas as pd #data analysis python module data = …
-1
votes
1 answer

Counting di-Amino Acid frequencies (Bigram frequencies) from FASTA files

Given a large amount of FASTA files (the peptidome for various organisms for secreted peptides), how can I read the FASTA files (from UNIProt) with Python (Or Matlab), and count the frequencies of each Amino Acid, and of amino-acid "double"…
GrimSqueaker
  • 412
  • 5
  • 17
-2
votes
2 answers

Where does BioPython store information related to various chemical molecules?

If we reconstruct a protein from a PDB file, is it enough to have a PDB file, or do we need more info external to the PDB? Take, for example, the BioPython framework. If any info is needed external to the PDB files, where does this framework store…
user366312
  • 16,949
  • 65
  • 235
  • 452