Questions tagged [biopython]

Biopython is a set of freely available tools for biological computation written in Python. Please only use this tag for issues relating to the Biopython suite of tools.

Biopython is a set of freely available tools for biological computation written in Python. It is developed by The Biopython Project, an international association of developers of Python tools for computational molecular biology. It includes a range of bioinformatics functionalities such as:

  • Parsing bioinformatics files into data structures usable by Python

  • Interfaces to commonly used bioinformatics programs (BLAST, Clustalw, EMBOSS among others)

  • Class for dealing with DNA, RNA and protein sequences. This includes feature annotations.

  • Tools for performing common operations on sequences, such as translation, transcription and weight calculations

amongst many, many others.

The biopython tag

Questions with tag should relate to issues involving the Biopython package of tools.

Learning More

The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research. It also has a useful wiki site.

The Biopython Cookbook provides many examples of Biopython being used as well as installation instructions and a FAQ section.

1345 questions
3
votes
1 answer

Write SeqIO Dictionary as Fasta File

I originally converted my fasta sequence into a dictionary with a Bio.SeqIO.to_dict statement. I would like to write a subsetted dictionary back to a fasta file. Test is a python dictionary with fasta headers as keys and the sequences as indexes.…
Cody Glickman
  • 514
  • 1
  • 8
  • 30
3
votes
2 answers

Opening and editing multiple files in a folder with python

I am trying to modify my .fasta files from this: >YP_009208724.1 hypothetical protein ADP65_00072 [Achromobacter phage phiAxp-3] MSNVLLKQ... >YP_009220341.1 terminase large subunit [Achromobacter phage phiAxp-1] MRTPSKSE... >YP_009226430.1 DNA…
tahunami
  • 141
  • 1
  • 7
3
votes
0 answers

What methods should I use from PythonCyc API to query metabolites in BioCyc database?

I am using PythonCyc API in order to write a query for metabolites in BioCyc. The purpose of this API is to communicate with the database software of BioCyc- Pathway Tools. Pathway Tools is in lisp therefore, PythonCyc creates a bridge between…
3
votes
1 answer

Bioformats-Python error: 'ascii' codec can't encode character u'\xb5' when using OMEXML()

I am trying to use bioformats in Python to read in a microscopy image (.lsm, .czi, .lif, you name it), print out the meta data, and display the image. ome = bf.OMEXML(md) gives me an error (below). I think it's talking about the information stored…
puifais
  • 738
  • 2
  • 9
  • 20
3
votes
1 answer

Getting residue number and residue name in Biopython PDB module

I'm currently using pymol's iterate to get all the residue numbers, and then I use those to retrieve the residue name. I don't think that's the best way to do it. I tried to look for a way in biopython to no avail. I would appreciate your input and…
Python Noob
  • 41
  • 2
  • 5
3
votes
1 answer

passing python variable (string) to bash command through echo pipe

I am having trouble passing a string in python (python variable) as input to a sequence alignment program (muscle) on the command line (bash). muscle can take stdin from the command line, e.g.; ~# echo -e ">1\nATTTCTCT\n>2\nATTTCTCC" |…
LP_640
  • 579
  • 1
  • 5
  • 17
3
votes
1 answer

Biopython unable to declare new SeqRecord

from Bio import SeqIO import re, os import pandas as pd from Bio.Seq import Seq from Bio.Alphabet import generic_dna from Bio.SeqRecord import SeqRecord os.chdir('c:\Users\Workspace\Desktop') filename =…
TJA
  • 59
  • 1
  • 4
3
votes
1 answer

overlap score matrix biopython

I have a FASTA file with DNA sequences and the names of the sequences and I need to make a matrix of the overlap scores. I found the module pairwise2 in Biopython which seems to do this quite well. Except my sequences are already aligned and when I…
JDh
  • 33
  • 4
3
votes
1 answer

How to download pubmed articles and read them?

Im having trouble to save pubmed articles and read them. I've seen at this page here that there are some special files types but no one of them worked for me. I want to save them in a way that I can continuous using the keys to get the the data. I…
user2535338
  • 355
  • 4
  • 20
3
votes
1 answer

Python: How to compare multiple sequences from a fasta file with each other?

I'm quite new to the programming world of python and I am trying to write a script that, given a FASTA file, will compare the sequences with each other and score them(If the position of the nucleotide in sequence A matches with the nucleotide in the…
D.Teeki
  • 55
  • 1
  • 4
3
votes
2 answers

How to convert a set of DNA sequences into protein sequences using python programming?

I am using python to create a program that converts a set of DNA sequences into amino acid (protein) sequences. I then need to find a specific subsequence, and count the number of sequences in which this specific subsequence is present. This is the…
3
votes
2 answers

Transform dna alignment into numpy array using biopython

I have several DNA sequences that have been aligned and I would like to keep only the bases that are variable at a specific position. This maybe could be done if we first transform the alignment into an array. I tried using the code in the…
newa123
  • 99
  • 8
3
votes
1 answer

Biopython installation on MacOSX El Capitan, gcc error -Qunused-arguments

I am trying to install Biopython, but get this error: > gdr$ python setup.py build running build running build_py running > build_ext building 'Bio.cpairwise2' extension gcc -DNDEBUG -g -fwrapv > -O3 -Wall -Wstrict-prototypes -Qunused-arguments…
grd
  • 287
  • 1
  • 2
  • 8
3
votes
1 answer

Entrez.esummary ('gene' db): how to retrieve uid from DictElement?

I'm trying to retrieve and save gene summaries from NCBI Entrez Gene database, and would like to keep the uid too, but, though it's there, I can't find the right way to retrieve it from the results. See below (NB: obviously not my valid email…
3
votes
1 answer

change specific parts of a string in python (update bootstrap values in phylogenetic trees)

So basically I have a string: string_1 = '(((A,B)123,C)456,(D,E)789)135' Containing a phylogenetic tree with bootstrap values is parenthetical notation (not really important to the question, but in case anyone was wondering). This example tree…
Andrew WM
  • 31
  • 1