Questions tagged [biopython]

Biopython is a set of freely available tools for biological computation written in Python. Please only use this tag for issues relating to the Biopython suite of tools.

Biopython is a set of freely available tools for biological computation written in Python. It is developed by The Biopython Project, an international association of developers of Python tools for computational molecular biology. It includes a range of bioinformatics functionalities such as:

  • Parsing bioinformatics files into data structures usable by Python

  • Interfaces to commonly used bioinformatics programs (BLAST, Clustalw, EMBOSS among others)

  • Class for dealing with DNA, RNA and protein sequences. This includes feature annotations.

  • Tools for performing common operations on sequences, such as translation, transcription and weight calculations

amongst many, many others.

The biopython tag

Questions with tag should relate to issues involving the Biopython package of tools.

Learning More

The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research. It also has a useful wiki site.

The Biopython Cookbook provides many examples of Biopython being used as well as installation instructions and a FAQ section.

1345 questions
3
votes
4 answers

Finding/Replacing substrings with annotations in an ASCII file in Python

I'm having a little coding issue in a bioinformatics project I'm working on. Basically, my task is to extract motif sequences from a database and use the information to annotate a sequence alignment file. The alignment file is plain text, so the…
Spyros
  • 249
  • 1
  • 3
  • 15
3
votes
1 answer

create alignment in Biopython without input file

I have an alignment of protein sequences in a dictionary (id_prot as key and aligned sequence as value; could be another format), and I would like to use this alignment to build a NJ tree with Biopython However, according to the documentation, the…
Romain
  • 137
  • 2
  • 9
3
votes
1 answer

Problem to parse a CIF file using MMCIF2Dict

I wrote a script to retrieve and treat information from the Protein Data Base. I import the MMCIFDict module from Bio.PDB.MMCIF2Dict which allows to parse the CIF data in a dictionary. It works well for almost all structures of my list but, I don't…
Fan
  • 27
  • 3
3
votes
1 answer

Pairwise alignment of multi-FASTA file sequences

I have multi-FASTA file containing more than 10 000 fasta sequences resulted from Next Generation Sequencing and I want to do pairwise alignment of each sequence to each sequence inside the file and store all the results in the same new file in…
Aurora
  • 31
  • 4
3
votes
1 answer

Python 3.x - How to efficiently split an array of objects into smaller batch files?

I'm fairly new to Python and I'm attempting to split a textfile where entries consists of two lines into batches of max. 400 objects. The data I'm working with are thousands of sequences in FASTA format (plain text with a header, used in…
Ludolph314
  • 93
  • 8
3
votes
2 answers

Directly calling SeqIO.parse() in for loop works, but using it separately beforehand doesn't? Why?

In python this code, where I directly call the function SeqIO.parse() , runs fine: from Bio import SeqIO a = SeqIO.parse("a.fasta", "fasta") records = list(a) for asq in SeqIO.parse("a.fasta", "fasta"): print("Q") But this, where I first…
3
votes
0 answers

How to unfold only protein atoms using Bio.PDB.Selection?

from Bio.PDB import PDBParser from Bio.PDB import Selection structure = PDBParser().get_structure('4GBX', '4GBX.pdb') # load your molecule atom_list = Selection.unfold_entities(structure[0]['E'], 'A') # 'A' is for Atoms in the chain 'E' When I…
3
votes
1 answer

Renaming interleaved fastq headers with biopython

For ease of use and compatibility with another downstream pipeline, I'm attempting to change the names of fastq sequence ids using biopython. For example... going from headers that look like this: @D00602:32:H3LN7BCXX:1:1101:1205:2112…
Gunther
  • 129
  • 7
3
votes
3 answers

How do I install biopython in anaconda?

I get SyntaxError: invalid syntax while trying to install biopython using the following command. conda install -c anaconda biopython Could you please help me install biopython in anaconda (3) ?
Jaswant S
  • 31
  • 1
  • 1
  • 2
3
votes
1 answer

Biopython pairwise alignment results in segmentation fault when run in loop

I am trying to run pairwise global alignment method in biopython in loop for about 10000 pair of strings. Each string on an average is 20 characters long. Running the method for a single pair of sequences works fine. But running this in a loop, for…
Amrith Krishna
  • 2,768
  • 3
  • 31
  • 65
3
votes
1 answer

Why do I get BioPython HTTPError: HTTP Error 400: Bad Request when I use Esearch and Efetch

I am trying to access a list of organisms from the chordata phylum that have sequenced chromosomes from the "assembly" database in Entrez. I am trying to do this using the E-utilities in biopython. I am able to search for the organisms using esearch…
3
votes
1 answer

NameError: name 'PROTOCOL_TLS' is not defined

I am trying to import Biopython modules on my Mac terminal but its throwing following error. It will be very helpful if someone could help me fix this issue. >>> from Bio import SeqIO Traceback (most recent call last): File "", line 1, in…
Ambuj Kumar
  • 31
  • 1
  • 2
3
votes
1 answer

Faster way of calculating the percentage of identical sites in alignment using biopython

I developed the following code to calculate the number of identical sites in an alignment. Unfortunately the code is slow, and I have to iterate it over hundreds of files, it takes close to 12 hours to process more than 1000 alignments, meaning that…
omv
  • 33
  • 3
3
votes
2 answers

How to use Biopython to translate a series of DNA sequences in a FASTA file and extract the Protein sequences into a separate field?

I am new to Biopython (and coding in general) and am trying to code a way to translate a series of DNA sequences (more than 80) into protein sequences, in a separate FASTA file. I want to also find the sequence in the correct reading frame. Here's…
macrosage
  • 31
  • 1
  • 2
3
votes
2 answers

How to separately get the X, Y or Z coordinates from a pdb file

I have a PDB file '1abz' (https://files.rcsb.org/view/1ABZ.pdb), which is containing the coordinates of a protein structure. Please ignore the lines of the header remarks, the interesting information starts at line 276 which says 'MODEL 1'. I would…
Cave
  • 201
  • 1
  • 4
  • 14