Questions tagged [biopython]

Biopython is a set of freely available tools for biological computation written in Python. Please only use this tag for issues relating to the Biopython suite of tools.

Biopython is a set of freely available tools for biological computation written in Python. It is developed by The Biopython Project, an international association of developers of Python tools for computational molecular biology. It includes a range of bioinformatics functionalities such as:

  • Parsing bioinformatics files into data structures usable by Python

  • Interfaces to commonly used bioinformatics programs (BLAST, Clustalw, EMBOSS among others)

  • Class for dealing with DNA, RNA and protein sequences. This includes feature annotations.

  • Tools for performing common operations on sequences, such as translation, transcription and weight calculations

amongst many, many others.

The biopython tag

Questions with tag should relate to issues involving the Biopython package of tools.

Learning More

The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research. It also has a useful wiki site.

The Biopython Cookbook provides many examples of Biopython being used as well as installation instructions and a FAQ section.

1345 questions
6
votes
5 answers

Protein sequence from uniprot protein id python

I was wondering if there is way to get the sequence of proteins from uniprot protein ids. I did check few online softwares but they allow to get one sequence at a time but I have 5536 vlues. Is there any package in biopython to do this?
AST
  • 127
  • 1
  • 2
  • 10
6
votes
1 answer

"invalid sequence" error in seqio.write() of biopython

This question is related to bioinformatics. I did not recieve any suggestions in corresponding forums, so I write it here. I need to remove non-ACTG nucleotides in fasta file and write output to a new file using seqio from biopython. My code is…
Hrant
  • 219
  • 3
  • 12
6
votes
1 answer

Issue with parsing publication data from PubMed with Entrez

I am trying to use Entrez to import publication data into a database. The search part works fine, but when I try to parse: from Bio import Entrez def create_publication(pmid): handle = Entrez.efetch("pubmed", id=pmid, retmode="xml") …
apiljic
  • 527
  • 4
  • 14
6
votes
1 answer

Phylo BioPython building trees

I trying to build a tree with BioPython, Phylo module. What I've done so far is this image: each name has a four digit number followed by - and a number: this number refer to the number of times that sequence is represented. That means 1578 - 22,…
psoares
  • 4,733
  • 7
  • 41
  • 55
6
votes
2 answers

Biopython parse from variable instead of file

import gzip import io from Bio import SeqIO infile = "myinfile.fastq.gz" fileout = open("myoutfile.fastq", "w+") with io.TextIOWrapper(gzip.open(infile, "r")) as f: line = f.read() fileout.write(line) fileout.seek(0) count = 0 for rec in…
Stuber
  • 447
  • 5
  • 16
6
votes
2 answers

How can I extract the abstract from efetch (Biopython, Entrez)?

I am new to python and would like to extract abstracts from pubmed using the entrez system from the bio package. I got the esearch to give me my UIDs (stored in my_list_ges) and I can also download an entry using efetch. Now, however, the result is…
MaxS
  • 978
  • 3
  • 17
  • 34
6
votes
1 answer

Can Biopython perform Seq.find() accounting for ambiguity codes

I want to be able to search a Seq object for a subsequnce Seq object accounting for ambiguity codes. For example, the following should be true: from Bio.Seq import Seq from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA amb = IUPACAmbiguousDNA() s1 =…
Malonge
  • 1,980
  • 5
  • 23
  • 33
6
votes
3 answers

Frequencies not adding up to one

I am writing a function that is supposed to go through a .fasta file of DNA sequences and create a dictionary of nucleotide (nt) and dinucleotide (dnt) frequencies for each sequence in the file. I am then storing each dictionary in a list called…
Bantha
  • 63
  • 4
6
votes
3 answers

Convert FASTA to GenBank

Is there a way to use BioPython to convert FASTA files to a Genbank format? There are many answers on how to convert from Genbank to FASTA, but not the other way around.
Ricky Su
  • 295
  • 1
  • 7
  • 10
6
votes
1 answer

Trying to parallelize a python algorithm using multithreading and avoiding GIL restrictions

I am implementing an algorithm in Python using Biopython. I have several alignments (sets of sequences of equal length) stored in FASTA files. Each alignment contains between 500 and 30000 seqs and each sequence is about 17000 elements long. Each…
6
votes
4 answers

how to extend ambiguous dna sequence

Let's say you have a DNA sequence like this : AATCRVTAA where R and V are ambiguous values of DNA nucleotides, where R represents either A or G and V represents A, C or G. Is there a Biopython method to generate all the different combinations of…
jrjc
  • 21,103
  • 9
  • 64
  • 78
6
votes
3 answers

Installation of biopython - python 3.3 not found in registry

I am trying to install biopython to run with Python 3.3 on a Windows7 computer. I have downloaded the biopython executable biopython-1.61.win32-py3.3-beta.exe. When I attempt to run the executable, however, I get the message "Python version 3.3 is…
gwilymh
  • 415
  • 1
  • 7
  • 20
6
votes
2 answers

Biopython class instance - output from Entrez.read: I don't know how to manipulate the output

I am trying to download some xml from Pubmed - no problems there, Biopython is great. The problem is that I do not really know how to manipulate the output. I want to put most of the parsed xml into a sql database, but I'm not familiar with the…
5
votes
4 answers

Split a multifasta file to files with the same number of accesion numbers

I have a file that has thousands of accession numbers: and looks like this.. >NC_033829.1 Kallithea virus isolate DrosEU46_Kharkiv_2014, complete genome AGTCAGCAACGTCGATGTGGCGTACAATTTCTTGATTACATTTTTGTTCCTAACAAAATGTTGATATACT >NC_020414.2 Escherichia…
LDT
  • 2,856
  • 2
  • 15
  • 32
5
votes
1 answer

How do I make more efficient code for a search for multiple strings in column in pandas

I am a newly self taught (minus 1 class on the very basics) programmer working for a bio lab. I have a script that goes though RNAseq data from two different cell types and runs a ttest if in another dataset. It worked for this application but the…
1
2
3
89 90