1

I'm trying to separate RNA from protein in a complex protein/RNA PDB file and I want all RNA info with the hetero atoms in between the bases BUT without H20 etc. In short I want RNA part of pdb file without discontinuous lines.

I managed to separate RNA from protein with Bio PDB Select but it consider hetero atoms as amino acid when I use is_aa(residue). So hetero atoms wont appear in my "only RNA" file.

from Bio.PDB import *
from Bio.PDB import PDBParser, PDBIO, Select
import os

class ProtSelect(Select):
    def accept_residue(self, residue):
        return 1 if is_aa(residue) == True else 0

class RNASelect(Select):
    def accept_residue(self, residue):
        return 1 if is_aa(residue) == False and residue.id[0] != "W" else 0

pdb = PDBParser().get_structure("2bh2", "pdb2bh2.ent")
io = PDBIO()
io.set_structure(pdb)
io.save("seqprotest.pdb", ProtSelect())
io.save("seqRNAtest.pdb", RNASelect())
Raph
  • 11
  • 2
  • maybe this question could be moved to [bioinformatics.SE](https://bioinformatics.stackexchange.com/). – marcin Jul 12 '19 at 14:51

1 Answers1

0

Did you try setting the standard=True argument to is_aa?

A quick look on the results of the following code looks promising to me:

from Bio.PDB import is_aa
from Bio.PDB import PDBParser, PDBIO, Select


class ProtSelect(Select):
    def accept_residue(self, residue):
        print(f"{residue} -> {is_aa(residue)}")
        return is_aa(residue, standard=True)


class RNASelect(Select):
    def accept_residue(self, residue):
        return (not is_aa(residue, standard=True)) and residue.id[0] != "W"


from Bio import PDB

repo = PDB.PDBList()
repo.retrieve_pdb_file("2bh2", file_format="pdb")

pdb = PDBParser().get_structure("2bh2", "bh/pdb2bh2.ent")
io = PDBIO()
io.set_structure(pdb)
io.save("seqprotest.pdb", ProtSelect())
io.save("seqRNAtest.pdb", RNASelect())

Note that I added a call to retrieve_pdb_file in order to create a self-contained example from your question.

The result so far:

  • 112 HETATM that are not HOH in pdb2bh2.ent
  • No HETATMs in seqprotest.pdb
  • 112 HETATMs in seqRNAtest.pdb
Lydia van Dyke
  • 2,466
  • 3
  • 13
  • 25