Questions tagged [biopython]

Biopython is a set of freely available tools for biological computation written in Python. Please only use this tag for issues relating to the Biopython suite of tools.

Biopython is a set of freely available tools for biological computation written in Python. It is developed by The Biopython Project, an international association of developers of Python tools for computational molecular biology. It includes a range of bioinformatics functionalities such as:

  • Parsing bioinformatics files into data structures usable by Python

  • Interfaces to commonly used bioinformatics programs (BLAST, Clustalw, EMBOSS among others)

  • Class for dealing with DNA, RNA and protein sequences. This includes feature annotations.

  • Tools for performing common operations on sequences, such as translation, transcription and weight calculations

amongst many, many others.

The biopython tag

Questions with tag should relate to issues involving the Biopython package of tools.

Learning More

The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research. It also has a useful wiki site.

The Biopython Cookbook provides many examples of Biopython being used as well as installation instructions and a FAQ section.

1345 questions
5
votes
1 answer

How to calculate the average structure of a protein with multiple models/conformations

I have a PDB file '1abz' (https://files.rcsb.org/view/1ABZ.pdb), containing the coordinates of a protein structure with 23 different models (numbered MODEL 1-23). Please ignore the header remarks, the interesting information starts at line 276 which…
Cave
  • 201
  • 1
  • 4
  • 14
5
votes
2 answers

Is there any way to get abstracts for a given list of pubmed ids?

I have list of pmids i want to get abstracts for both of them in a single url hit pmids=[17284678,9997] abstract_dict={} url = https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? …
pat
  • 135
  • 2
  • 10
5
votes
6 answers

How can I merge overlapping strings in python?

I have some strings, ['SGALWDV', 'GALWDVP', 'ALWDVPS', 'LWDVPSP', 'WDVPSPV'] These strings partially overlap each other. If you manually overlapped them you would get: SGALWDVPSPV I want a way to go from the list of overlapping strings to the…
Adam Price
  • 810
  • 2
  • 11
  • 21
5
votes
1 answer

Fastest way to add Ns to variable length sequences such that they all equal 150bp

Say I have a fasta containing 3 sequences... ATTTTTGGA AT A I want my sequence data to look like this: ATTTTTGGA ATTNNNNNN ANNNNNNNN Are there any programs or scripts that could accomplish this in a reasonable timeframe. I have thousands of…
user3105519
  • 309
  • 4
  • 10
5
votes
1 answer

Biopython record extra 2 nucleotides either side of gene for +2,-2 reading frames

I'm looking for ambush stop codons. I've gotten my code to the point where I'm extracting the sequences I need from my embl files. However I'm a bit stumped on how to add two upstream and two downstream nucleotides so I end up having -2,-1, 0, 1, 2…
5
votes
2 answers

Is it possible to pass a string variable to a BLAST search instead of a file?

I'm writing a python script and want to pass the query sequence information into blastn as a string variable rather than a FASTA format file if possible. I used Biopython's SeqIO to store several transcript names as key and its sequences as the…
5
votes
4 answers

Fetching genomic sequence efficiently in Python?

How can I fetch genomic sequence efficiently using Python? For example, from a .fa file or some other easily obtained format? I basically want an interface fetch_seq(chrom, strand, start, end) which will return the sequence [start, end] on the…
user248237
5
votes
2 answers

Installing Biopython: ImportError: No module named Bio

trying to install Biopython on Fedora 21, Python 2.7. I've done the following [mike@localhost Downloads](17:32)$ sudo pip2.7 install biopython You are using pip version 6.1.1, however version 7.1.0 is available. You should consider upgrading via the…
Mike
  • 633
  • 3
  • 9
  • 18
5
votes
1 answer

Deleteing residue from PDB using Biopython library

Using biopython library, I want to remove the residues that are listed in list as follows. This thread (http://pelican.rsvs.ulaval.ca/mediawiki/index.php/Manipulating_PDB_files_using_BioPython) provides an example to remove residue. I have following…
Exchhattu
  • 197
  • 3
  • 15
5
votes
1 answer

Get all neighbors of a set of Residues

I have a list of residue numbers saved in centerResidueList = [100, 140, 170, 53] and I am trying to get all the neighboring residues from this set of residues. Currently I am using the script below, were I process the whole PDB file and generate a…
Reyhaneh
  • 409
  • 1
  • 7
  • 21
5
votes
1 answer

Attempting to Obtain Taxonomic Information from Biopython

I am attempting to alter a previous script that utilizes biopython to fetch information about a species phylum. This script was written to retrieve information one species at a time. I would like to modify the script so that I can do this for 100…
user2374216
  • 63
  • 1
  • 5
5
votes
1 answer

Numpy and Biopython must be integrated?

For example... I have two scripts for look if a (Multiple Sequence Alignment) MSA has more than 50 columns with less than 50% of gaps. The first using BioPython takes 4.2 seconds in a MSA of 16281 sequences with 609 columns (PF00085 of Pfam in fasta…
Diego Javier Zea
  • 876
  • 10
  • 12
4
votes
2 answers

My open reading frame (ORF) finding code is not finding the longest ORF in the sequence

I am trying to code a function that finds the longest Open reading frame. However, in this one instance it is not locating the longest ORF and I cannot figure out why. This is the…
4
votes
2 answers

How do I set the PYTHONPATH on Cygwin?

In the Biopython installation instructions, it says that if Biopython doesn't work I'm supposed to do this: export PYTHONPATH = $PYTHONPATH':/directory/where/you/put/Biopython' I tried doing that in Cygwin from the ~ directory using the name of the…
4
votes
2 answers

finding CDRs in NGS data

I have millions of sequences in fasta format and want to extract CDRs (CDR1, CDR2 and CDR3).I chose only one sequence as an example and tried to extract CDR1 but not able to extract…
shivam
  • 596
  • 2
  • 9
1 2
3
89 90