Questions tagged [biopython]

Biopython is a set of freely available tools for biological computation written in Python. Please only use this tag for issues relating to the Biopython suite of tools.

Biopython is a set of freely available tools for biological computation written in Python. It is developed by The Biopython Project, an international association of developers of Python tools for computational molecular biology. It includes a range of bioinformatics functionalities such as:

  • Parsing bioinformatics files into data structures usable by Python

  • Interfaces to commonly used bioinformatics programs (BLAST, Clustalw, EMBOSS among others)

  • Class for dealing with DNA, RNA and protein sequences. This includes feature annotations.

  • Tools for performing common operations on sequences, such as translation, transcription and weight calculations

amongst many, many others.

The biopython tag

Questions with tag should relate to issues involving the Biopython package of tools.

Learning More

The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research. It also has a useful wiki site.

The Biopython Cookbook provides many examples of Biopython being used as well as installation instructions and a FAQ section.

1345 questions
4
votes
2 answers

Generate the all possible unique peptides (permutants) in Python/Biopython

I have a scenario in which I have a peptide frame having 9 AA. I want to generate all possible peptides by replacing a maximum of 3 AA on this frame ie by replacing only 1 or 2 or 3 AA. The frame is CKASGFTFS and I want to see all the mutants by…
4
votes
3 answers

Remove duplicated sequences in FASTA with Python

I apologize if the question has been asked before, but I have been searching for days and could not find a solution in Python. I have a large fasta file, containing headers and sequences. >cavPor3_rmsk_tRNA-Leu-TTA(m) range=chrM:2643-2717 5'pad=0…
4
votes
0 answers

How to use entrezpy and Biopython Entrez libraries to access ClinVar data from genomic position of variant

[Disclaimer: I have published this question 3 weeks ago in biostars, with no answers yet. I really would like to get some ideas/discussion to find a solution, so I post also here. biostars post link: https://www.biostars.org/p/447413/] For one of my…
4
votes
1 answer

Convert a hmmer --tblout output to a pandas dataframe

Is there a way to convert a hmmer output to a pandas dataframe? I am also unsure how to load a hmmer tblout table into python via the Bio module. I believe you can call a hmmer format with SeqIO.parse or SeqIO.search.The format of the table…
Cody Glickman
  • 514
  • 1
  • 8
  • 30
4
votes
2 answers

Biopython: export the protein fragment from PDB to a FASTA file

I am writing the PDB protein sequence fragment to fasta format as below. from Bio.SeqIO import PdbIO, FastaIO def get_fasta(pdb_file, fasta_file, transfer_ids=None): fasta_writer = FastaIO.FastaWriter(fasta_file) …
Dr. Abrar
  • 327
  • 2
  • 5
  • 17
4
votes
2 answers

'module' object is not callable - Bio.IUPAC

When I try, from Bio.Alphabet import IUPAC from Bio import Seq my_prot = Seq("AGTACACTGGT", IUPAC.protein) Why do I encounter the following error: TypeError: 'module' object is not callable PS: this is an Example from the BioPython's Cookbook
Saif al Harthi
  • 2,948
  • 1
  • 21
  • 26
4
votes
2 answers

Fastest way to count instances of substrings in string Python3.6

I have been working on a program which requires the counting of sub-strings (up to 4000 sub-strings of 2-6 characters located in a list) inside a main string (~400,000 characters). I understand this is similar to the question asked at Counting…
DanStu
  • 174
  • 9
4
votes
2 answers

ImportError: cannot import name _aligners [biopython]

I am doing bioinformatics that has biopython dependency. Biopython always give me the following error: I hope someone could help me with this issue. Thank you!
jcampecino
  • 41
  • 1
  • 2
4
votes
3 answers

Make a list in python from a FASTA text file

I have text file like this small…
john
  • 263
  • 1
  • 9
4
votes
1 answer

How does Bio.PDB identify hetero-residues?

I'm wondering how Bio.PDB identifies a residue as a hetero-residue. I know that the residue.id method returns a tuple in which the first item is the hetero flag, the second one is the residue identifier (number) and the third one is the insertion…
M. Iyer
  • 115
  • 5
4
votes
1 answer

Plotting the score matrix from a Needleman-Wunsch pairwise sequence alignment in matplotlib

I'm trying to draw a matrix according to global alignment algorithm (or Needleman–Wunsch algorithm) in Python. I don't know if matplotlib is the best tool for this case. I tried to use Bokeh but the structure was so difficult to fit a matrix as I…
Kevin Hernández
  • 704
  • 10
  • 25
4
votes
2 answers

"NotImplementedError: SeqRecord" when using sorted on a fasta file parsed using SeqIO

I'm trying to sort a fasta file by alphabetical order of the sequences in the file (not the ID of the sequences). The fasta file contains over a 200 sequences and I'm trying to find duplicates (by duplicates I mean almost same protein sequence, but…
4
votes
1 answer

Estimate Alphabet in Biopython from fasta file

I am looking for a way to read a .fasta file in Biopython and have the package estimate if we are dealing with DNA, RNA or proteins. So far, I read data like this: with open('file.fasta', 'r') as f: for seq in sio.parse(f, 'fasta'): # do…
romeasy
  • 260
  • 1
  • 3
  • 12
4
votes
1 answer

Laplacian smoothing to Biopython

I am trying to add Laplacian smoothing support to Biopython's Naive Bayes code 1 for my Bioinformatics project. I have read many documents about Naive Bayes algorithm and Laplacian smoothing and I think I got the basic idea but I just can't…
Limin
  • 43
  • 1
  • 3
4
votes
1 answer

Extract specific fasta sequences from a big fasta file

I want to extract specific fasta sequences from a big fasta file using the following script, but the output is empty. The transcripts.txt file contains the list transcripts IDs that I want to export (both the IDs and the sequences) from…
Chiara E
  • 107
  • 2
  • 8