Questions tagged [biopython]

Biopython is a set of freely available tools for biological computation written in Python. Please only use this tag for issues relating to the Biopython suite of tools.

Biopython is a set of freely available tools for biological computation written in Python. It is developed by The Biopython Project, an international association of developers of Python tools for computational molecular biology. It includes a range of bioinformatics functionalities such as:

  • Parsing bioinformatics files into data structures usable by Python

  • Interfaces to commonly used bioinformatics programs (BLAST, Clustalw, EMBOSS among others)

  • Class for dealing with DNA, RNA and protein sequences. This includes feature annotations.

  • Tools for performing common operations on sequences, such as translation, transcription and weight calculations

amongst many, many others.

The biopython tag

Questions with tag should relate to issues involving the Biopython package of tools.

Learning More

The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research. It also has a useful wiki site.

The Biopython Cookbook provides many examples of Biopython being used as well as installation instructions and a FAQ section.

1345 questions
-2
votes
1 answer

Calculate the mean of amino acids sequences biopython

I have this code for calculating the length of sequences in fasta format using BioPython. I got the lenghts. NP_418305.1 349 NP_418306.1 469 NP_418308.1 236 However, now I'd like to calcule the mean of the whole sequences, just like an intereting…
-2
votes
1 answer

Delete specific letter in a FASTA sequence

I have a FASTA file that has about 300000 sequences but some of the sequences are like these >Spike|hCoV-19/Wuhan/WH02/2019|2019-12-31|EPI_ISL_406799|Original|hCoV-19^^Wuhan|Human|General Hospital of Central Theater Command of People's Liberation…
-2
votes
1 answer

Biopython for Anaconda on OSX "No module named 'Bio'" error

When I import Bio in a Jupyter notebook in Anaconda on a Mac I this error: No module named 'Bio'.
Reto
  • 1
  • 1
-2
votes
1 answer

Save every fourth item in a list to a key in dictionary in python

I need to save every fourth item of a list as a key in python, then starting at 5 every fourth item as a value, and starting at 6 every fourth item as a value (the value will be a list) I have a text file that looks like this: GeneID NumC NumL …
cam
  • 31
  • 1
  • 8
-2
votes
1 answer

no such file or directory

I keep receiving this error when I try to do anything in Biopython. I am not sure how to change the path of Biopython since python and Biopython are in the same path. Im not sure what else I need to do. Python 3.8.5 (v3.8.5:580fbb018f, Jul 20…
Chris
  • 59
  • 1
  • 9
-2
votes
1 answer

Python finding the longest ORF

Can someone show me a straightforward solution for how to calculate the longest open reading frame (ORF) > 30bp in length in a DNA sequence? ATG is the start codon (i.e., the beginning of an ORF) and TAG, TGA, and TAA are stop codons (i.e., the end…
-2
votes
1 answer

editing a file in python and making a new one

I have a big text file ("|" separated) like this small…
user3631908
  • 23
  • 1
  • 8
-2
votes
1 answer

Python to convert RNA seq into single-letter Amino Acid sequence

I need some assistance in writing a code that will convert a given RNA nucleotide sequence into an Amino Acid sequence. I've currently been given 2 dictionaries to use: one of Amino Acid codons and their respective 3-letter codes, and one of the…
-2
votes
1 answer

Extracting a value from a combination of list of dictionaries

The eutils package form NCBI returns the below object for a specific request. From this I want to extract the value 245540. How can I do that? [{u'LinkSetDb' : [{u'DbTo' : 'sra', u'Link' : [{u'Id': '245540'}],…
-2
votes
1 answer

unable to run .py file however all the codes in python are vaild

from Bio import SeqIO import re, os import pandas as pd from Bio.Seq import Seq from Bio.Alphabet import generic_dna from Bio.SeqRecord import SeqRecord os.chdir('c:\\Users\Workspace\\Desktop') f_out =…
TJA
  • 59
  • 1
  • 4
-2
votes
1 answer

IOError while retrieving sequences from fasta file using biopython

I have a fasta file containning PapillomaViruses sequences (entire genomes, partial CDS, ....) and i'm using biopython to retrieve entire genomes (around 7kb) from this files, so here's my code: rec_dict =…
-2
votes
1 answer

how can I extract fasta from gff file based genome fasta, then merge fasta under one transcript to output

Thanks for your help. I want to extract the specific intron fasta, then merge the intron fasta with CDS fasta to output my specific transcript.how can i do this with biopython or python? my gff file.example: 1 ensembl intron 7904 9192 . -…
Hailong Yang
  • 1
  • 1
  • 3
-2
votes
1 answer

Remove duplicate sequences from fasta file based on ID

I wrote a tiny biopython script to extract sequences from a fasta file based on ID but it does extract duplicates so I am looking to filter sequences from my fasta files which are duplicate (e.g. have the exact same ID). I tried to modify my script…
user3188922
  • 329
  • 1
  • 3
  • 19
-2
votes
1 answer

Structure structure alignment using combinatorial extensions

I am trying to find a tool that performs structure structure alignment for two sequences given their residues using combinatorial extensions (CE). I found a tool based on combinatorial extensions provided by the protein data bank:…
-2
votes
1 answer

How do I call write a python function without opening the file beforehand?

I'm using python2.7, and have written a few functions for analyzing protein structure files, which I have saved as pdbtools.py One function, for example, is getprot() which lets me pull protein structures from a database. After I open and edit the…
Devinity
  • 377
  • 1
  • 5
  • 17
1 2 3
89
90