Questions tagged [biopython]

Biopython is a set of freely available tools for biological computation written in Python. Please only use this tag for issues relating to the Biopython suite of tools.

Biopython is a set of freely available tools for biological computation written in Python. It is developed by The Biopython Project, an international association of developers of Python tools for computational molecular biology. It includes a range of bioinformatics functionalities such as:

  • Parsing bioinformatics files into data structures usable by Python

  • Interfaces to commonly used bioinformatics programs (BLAST, Clustalw, EMBOSS among others)

  • Class for dealing with DNA, RNA and protein sequences. This includes feature annotations.

  • Tools for performing common operations on sequences, such as translation, transcription and weight calculations

amongst many, many others.

The biopython tag

Questions with tag should relate to issues involving the Biopython package of tools.

Learning More

The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research. It also has a useful wiki site.

The Biopython Cookbook provides many examples of Biopython being used as well as installation instructions and a FAQ section.

1345 questions
4
votes
3 answers

Sort options for Pubmed eutils esearch?

I am using BioPython to query the Pubmed database through the eutils API. The esearch endpoint has a sort option, but the API documentation doesn't list all of the options for this…
adamc
  • 800
  • 6
  • 9
4
votes
3 answers

Aligning DNA sequences inside python

I have thousands of DNA sequences ranged between 100 to 5000 bp and I need to align and calculate the identity score for specified pairs. Biopython pairwise2 does a nice job but only for short sequences and when the sequence size get bigger than…
Masih
  • 920
  • 2
  • 19
  • 36
4
votes
1 answer

Biopython retrieving particular CDS from a whole genome

I am new to Stackoverflow. I am trying to automate search process using Biopython. I have two lists, one with protein GI numbers and one with corresponding nucleotide GI numbers. For…
anoviks
  • 43
  • 4
4
votes
1 answer

Multiple Sequence Alignment with Unequal String Length

I need a methodology for creating a consensus sequence out of 3 - 1000 short (10-20bp) nucleotide ("ATCG") reads of varying lengths. A simplified example: "AGGGGC" "AGGGC" "AGGGGGC" "AGGAGC" "AGGGGG" Should result in a consensus sequence of…
4
votes
3 answers

how the multiple pdbs can be written in single pdb file using biopython libraries

I wonder how the multiple pdbs can be written in single pdb file using biopython libraries. For reading multiple pdbs such as NMR structure, there is content in documentation but for writing, I do not find. Does anybody have an idea on it?
Exchhattu
  • 197
  • 3
  • 15
4
votes
3 answers

Using Bio.SeqIO to write single-line FASTA

QIIME requests this (here) regarding the fasta files it receives as input: The file is a FASTA file, with sequences in the single line format. That is, sequences are not broken up into multiple lines of a particular length, but instead the entire…
Korem
  • 11,383
  • 7
  • 55
  • 72
4
votes
1 answer

Biopython NCBIWWW.qblast test file -hangs on

When I try to run a test file provided by Biopython for NCBIWWW.qblast online search, it just hangs on and on and never responds. The same happens when I am trying to run any script on my own that includes NCBIWWW.qblast: it just arrives to this…
4
votes
1 answer

Conversion of distance matrix to Newick format

My ultimate aim is to make a plot which merges a heatmap and a phylogenetic tree. I have accomplished the heatmap and I have also found ETE2 package in BioPython which could help me merge the two kinds of plots, however ETE2 requires Newick…
user2998764
  • 445
  • 1
  • 6
  • 22
4
votes
1 answer

BioPython: How to convert the amino acid alphabet to

When discussing how to import sequence data using Bio.SeqIO.parse(), the BioPython cookbook states that: There is an optional argument alphabet to specify the alphabet to be used. This is useful for file formats like FASTA where otherwise Bio.SeqIO…
Kevin
  • 1,112
  • 2
  • 15
  • 29
4
votes
2 answers

Downloading Protein Sequences of multiple Organisms

I am attempting to use biopython to download all of the proteins of a list of organisms sequenced by a specific institution. I have the organism names and BioProject's associated with each organism; specifically I am looking to analyze the proteins…
redvyper
  • 137
  • 2
  • 11
4
votes
2 answers

How to identify what feature(s) are at a specific location in a genome

I am interested in identifying what feature (i.e. gene/cds) is at a particular location of a genome. For instance, what gene (if any) encompasses position 2,000,000. I know how to do this with a for loop and looping through each feature in the…
ded
  • 420
  • 2
  • 13
4
votes
1 answer

How to retrieve all possible combinations given a sequence of keys from a dictionary with list values

I have for instance this dictionary d={'M':['ATG'],'D':['GAC','GAT'],'E':['GAA','GAG']} What I'd like to have as an output given a sequence of keys is a list with all possible sequences. (could be a string as well, in which all the possible…
Àngel Ba
  • 371
  • 2
  • 9
4
votes
4 answers

Where to deposit a Python script that performs bioinformatics analyses?

I've written an analytical pipeline in Python that I think will be useful to other people. I'm wondering whether it is customary to publish such scripts in GitHub, whether there's a specific place to do this for Python scripts, or even if there's a…
Atticus29
  • 4,190
  • 18
  • 47
  • 84
4
votes
4 answers

How to extract chains from a PDB file?

I would like to extract chains from pdb files. I have a file named pdb.txt which contains pdb IDs as shown below. The first four characters represent PDB IDs and last character is the chain IDs. 1B68A 1BZ4B 4FUTA I would like to 1) read the file…
user1545114
  • 75
  • 1
  • 1
  • 4
4
votes
2 answers

connecting to Ensembl via biopython

I have just join to the python and biopython work and like to connect Ensebml and fetch some sequences and other data like TSS, list of some genes and etc. But my problem is that I cannot seem to find any method or module in biopython to do so. I…
Mehsah Yhook
  • 83
  • 1
  • 4