Questions tagged [biopython]

Biopython is a set of freely available tools for biological computation written in Python. Please only use this tag for issues relating to the Biopython suite of tools.

Biopython is a set of freely available tools for biological computation written in Python. It is developed by The Biopython Project, an international association of developers of Python tools for computational molecular biology. It includes a range of bioinformatics functionalities such as:

  • Parsing bioinformatics files into data structures usable by Python

  • Interfaces to commonly used bioinformatics programs (BLAST, Clustalw, EMBOSS among others)

  • Class for dealing with DNA, RNA and protein sequences. This includes feature annotations.

  • Tools for performing common operations on sequences, such as translation, transcription and weight calculations

amongst many, many others.

The biopython tag

Questions with tag should relate to issues involving the Biopython package of tools.

Learning More

The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research. It also has a useful wiki site.

The Biopython Cookbook provides many examples of Biopython being used as well as installation instructions and a FAQ section.

1345 questions
3
votes
1 answer

Making Blast database from FASTA in Python

How can I do this? I use Biopython and saw manual already. Of course I can make blastdb from FASTA using "makeblastdb" in standalone NCBI BLAST+, but I want to whole process in one program. It seems there are two possible solutions. Find a function…
3
votes
1 answer

Change which axis is used as radial/angular position with matplotlib polar=True (Creating circular phylogenetic tree with Bio.Phylo)

If I create a phylogenetic tree using Biopython Phylo module, it is oriented from left to right, e.g.: import matplotlib.pyplot as plt from Bio import Phylo from io import StringIO handle = StringIO("(((A,B),(C,D)),(E,F,G));") tree =…
Roger Vadim
  • 373
  • 2
  • 12
3
votes
5 answers

separate the abnormal reads of DNA (A,T,C,G) templates

I have millions of DNA clone reads and few of them are misreads or error. I want to separate the clean reads only. For non biological background: DNA clone consist of only four characters (A,T,C,G) in various permutation/combination. Any character,…
shivam
  • 596
  • 2
  • 9
3
votes
1 answer

Extracting multiple abstracts from pubmed from external pubmed IDs list with biopython

I am trying to extract the abstracts for 60K articles from Pubmed using PubmedIDs. I am trying to export the abstracts into a dictionary. I guess there is some issue with the code I am using, especially while parsing the pubmed IDs. Please help in…
Thulasi R
  • 331
  • 4
  • 13
3
votes
2 answers

convert single letter code to 3 letter code along with the chain numbering

I have a string having single letter AA code ('GPHMAQGTLI'). I want to convert into its respective 3 letter code and give the numbering in a range (300-309) in the same order. I have written the following but because of 'return' function it is…
shivam
  • 596
  • 2
  • 9
3
votes
0 answers

Finding CDRs (sequence) by its definition

I am puzzling to find the CDRs by its definition. Definition is to be matched with the previous and next sequence pattern (known as prefix and suffix respectively) and the CDR is between them. Moreover few point mutation are allowed in prefix and…
3
votes
1 answer

Python: How to convert a phylogenetic tree into a circular format?

I have a fasta file of aligned sequences that I want to create a phylogenetic tree for. I can create a normal branched tree like this... Normal branched phylogenetic tree but would like to convert it so it looks something like this...Circular…
fullerpj
  • 31
  • 1
3
votes
2 answers

Rename fasta file according to a dataframe in python

Hello I have huge file such as : >Seq1.1 AAAGGAGAATAGA >Seq2.2 AGGAGCTTCTCAC >Seq3.1 CGTACTACGAGA >Seq5.2 CGAGATATA >Seq3.1 CGTACTACGAGA >Seq2 AGGAGAT and a dataframe such as : tab query New_query Seq1.1 Seq1.1 Seq2.2 Seq2.2 Seq3.1 Seq3.1_0 Seq5.2…
chippycentra
  • 3,396
  • 1
  • 6
  • 24
3
votes
1 answer

Getting protein sequences by accessing Uniprot (with Python)

I have a list of protein id's I'm trying to access the protein sequences from Uniprot with python. I came across this post :Protein sequence from uniprot protein id python but gives a list of elements and not the actual sequence: Code import…
3
votes
2 answers

Find length of a contig in one fasta, using the header of another fasta as query in python

I'm trying to find a python solution to extract the length of a specific sequence within a fasta file using the full header of the sequence as the query. The full header is stored as a variable earlier in the pipeline (i.e. "CONTIG"). I would like…
Gunther
  • 129
  • 7
3
votes
0 answers

Finding number of articles for a disease using PubMed (python)

I am looking for a way to efficiently ask Entrez (Biopython) to retrieve the number of articles in PubMed associated to a given indication/condition. I only have the list of full indications. Now, I have worked out a way, the only problem being that…
Lusian
  • 629
  • 1
  • 5
  • 11
3
votes
1 answer

How to save each ligand from a PDB file separately with Bio.PDB?

I have a list of PDB files. I want to extract the ligands of all the files (so, heteroatoms) and save each one separately into PDB files, by using the Bio.PDB module from BioPython. I tried some solutions, like this one: Remove heteroatoms from PDB…
MathB
  • 49
  • 9
3
votes
1 answer

How can i eliminate duplicated sequences in fasta file

I'm trying to build database bacteria genre using all the sequences published to calculate the coverage of my reads against this database using bowtie2 for mapping, for that, I merge all the genomes sequences I downloaded from ncbi in one…
Reda
  • 449
  • 1
  • 4
  • 17
3
votes
1 answer

How to retrieve information from Pubmed according to date and term using Python?

Can you tell me how I can obtain the 5 newest articles from PubMed that contain the word 'obesity' and return the authors, the title, the date, doi and PubMed PMID of each paper using Python? Thank you in advance EDIT: My try so far. I believe this…
EAS
  • 39
  • 5
3
votes
3 answers

'position-aware' aligning of sequences with letter annotations

We have 2 DNA sequences (strings): >1 ATGCAT 135198 >2 ATCAT Expected output: first, we need to align these 2 strings, then get relevant annotation by index: ATGCAT AT-CAT 13-198 First part can be done using Biostrings…
zx8754
  • 52,746
  • 12
  • 114
  • 209