Questions tagged [biopython]

Biopython is a set of freely available tools for biological computation written in Python. Please only use this tag for issues relating to the Biopython suite of tools.

Biopython is a set of freely available tools for biological computation written in Python. It is developed by The Biopython Project, an international association of developers of Python tools for computational molecular biology. It includes a range of bioinformatics functionalities such as:

  • Parsing bioinformatics files into data structures usable by Python

  • Interfaces to commonly used bioinformatics programs (BLAST, Clustalw, EMBOSS among others)

  • Class for dealing with DNA, RNA and protein sequences. This includes feature annotations.

  • Tools for performing common operations on sequences, such as translation, transcription and weight calculations

amongst many, many others.

The biopython tag

Questions with tag should relate to issues involving the Biopython package of tools.

Learning More

The web site http://www.biopython.org provides an online resource for modules, scripts, and web links for developers of Python-based software for life science research. It also has a useful wiki site.

The Biopython Cookbook provides many examples of Biopython being used as well as installation instructions and a FAQ section.

1345 questions
-1
votes
1 answer

Why does this work the way it does (rstrip)

I have a code that works properly but I don't understand it. Why does buf[1:] print out only Rosalind_4402 without the DNA afterwards. FASTA…
DandyApe
  • 115
  • 8
-1
votes
1 answer

How can I check if a file is a real FASTQ (python)?

I have to check if a file is FASTA, FASTQ or none of those. For the FASTA checking i used the module SeqIO from Bio: def is_fasta(filename): with open(filename, "r") as handle: fasta = SeqIO.parse(handle, "fasta") return…
C insi
  • 13
  • 3
-1
votes
1 answer

How to sort a large FASTA file based on date?

I have a large FASTA file that looks like this >Spike|hCoV-19/Wuhan/WIV04/2019|2019-12-30|EPI_ISL_402124|Original|hCoV-19^^Hubei|Human|Wuhan Jinyintan Hospital|Wuhan Institute of…
-1
votes
1 answer

Biopython SeqIO: AttributeError: 'str' object has no attribute 'id'

I am trying to filter out sequences using SeqIO but I am getting this error. Traceback (most recent call last): File "paralog_warning_filter.py", line 61, in . . . SeqIO.write(desired_proteins,…
mdgn15
  • 15
  • 4
-1
votes
3 answers

how to extract first part of name(first name) in a list that contains full names and discard names with one part

I have a CSV file that contains one column of names. what I want is a python code to check every name in the column and see if the name has more than one part, it takes just the first part and appends it in a new CSV file list while it skips any…
Mumdooh
  • 37
  • 4
-1
votes
1 answer

find motif from in between fasta file from python

Can someone help me with this python code? When I run it, nothing happens. No errors or anything weird to me. It reads in and opens the file just fine. I have a set of protein sequence in Fasta format and I have to find motifs of my sequence like…
-1
votes
1 answer

How to find Mutations for a reverse oriented gene(like pncA) from TB sequencing fasta file using biopython library in Python3?

To find a mutation like for S104R(from 2288681 to 2289241 for pyrazinamide), we have to first remove '-'(for stripping insertion/deletions if/any present in fasta file), then take reverse complement of it and then look for the particular mutation…
-1
votes
1 answer

How to change for loop to work efficiently python

I have stuck with this script it would be great if you could help me with your inputs. My problem is that I think the script is not that efficient - it takes a lot of time to end running. I have a fasta file with around 9000 sequence lines (example…
Apex
  • 1,055
  • 4
  • 22
-1
votes
1 answer

How do I change part of a file name when it is a variable in python?

I currently have a python script which takes a file as a command-line argument, does what it needs to do, and then outputs that file with _all_ORF.fsa_aa appended. I'd like to actually edit the file name rather than appending, but I am getting…
Jpike
  • 187
  • 8
-1
votes
1 answer

how to parallel running of python scripts on bioinformatics

I wish to use python to read in a fasta sequence file and convert it into a panda dataframe. I use the following scripts: from Bio import SeqIO import pandas as pd def fasta2df(infile): records = SeqIO.parse(infile, 'fasta') seqList = [] …
Yeping Sun
  • 405
  • 1
  • 6
  • 18
-1
votes
2 answers

Square a matrix in python

Hello let say I have a df such as : G1 G2 VALUE SP1 SP2 1 SP1 SP3 2 SP1 SP4 3 SP2 SP3 4 SP2 SP4 5 SP3 SP4 6 how can I get a the data as square ? (i.e., have the same number of rows and columns) with something like data = [[0, 1, 2, 3], [1, …
Grendel
  • 783
  • 4
  • 12
-1
votes
1 answer

Gene Protein Sequence Database

I am wondering if there is a way to download or retrieve all the protein sequences of Genes from NCBI. I have the lots of GeneIDs I would like to iterate and retrieve their protein sequence. Is there a package I use for this or link to the protein…
Ibk
  • 59
  • 4
-1
votes
1 answer

why is makeblastdb not working with syntax error

import sys sys.path.append('/home/minhlam/ncbi-blast-2.10.1+/bin/db') makeblastdb -in human.fa -db mouse.fa -out mousedb -outfmt 5 The error is: File "parseBlast.py", line 5 makeblastdb -in human.fa -db mouse.fa -out mousedb -outfmt 5 …
minjah
  • 7
  • 2
-1
votes
1 answer

How to get distance between two atoms using for loop?

I have one PDB structure. This structure has 13 residues. I have to find the distance between two atoms(only C,O,N,S) using for loop. First I have to find the distance between first and second residue. after that first and third residue.up to first…
neena
  • 29
  • 1
  • 4
-1
votes
3 answers

is there a way to replace letters once in a string?

I'm running into a problem where it either replaces all Gs to Cs but doesn't replace the C to Gs, what can I do to fix this problem? the output im getting right now is "GUGAGGGGAG" the output im looking for is "CUCAGCGCAG" This is the code that I…
Ragster
  • 17
  • 3