0

I'm trying to go through a .pdb file, calculate distance between alpha carbon atoms from different residues on chains A and B of a protein complex, then store the distance in a dictionary, together with the chain identifier and residue number.

For example, if the first alpha carbon ("CA") is found on residue 100 on chain A and the one it binds to is on residue 123 on chain B I would want my dictionary to look something like d={(A, 100):[B, 123, distance_between_atoms]}

from Bio.PDB.PDBParser import PDBParser
parser=PDBParser()
struct = parser.get_structure("1trk", "1trk.pdb")

def getAlphaCarbons(chain):
    vec = []
    for residue in chain:
        for atom in residue:
            if atom.get_name() == "CA":
                vec = vec + [atom.get_vector()]
    return vec

def dist(a,b):
    return (a-b).norm()


chainA = struct[0]['A']
chainB = struct[0]['B']

vecA = getAlphaCarbons(chainA)
vecB = getAlphaCarbons(chainB)

t={}
model=struct[0]

for model in struct:
    for chain in model:
        for residue in chain:
            for a in vecA:
               for b in vecB:
                if dist(a,b)<=8:
                    t={(chain,residue):[(a, b, dist(a, b))]}

     break
print t  

It's been running the programme for ages and I had to abort the run (have I made an infinite loop somewhere??)

I was also trying to do this:

t = {i:[((a, b, dist(a,b)) for a in vecA) for b in vecB if dist(a, b) <= 8] for i in chainA}
print t

But it's printing info about residues in the following format:

<Residue PHE het=  resseq=591 icode= >: []    

It's not printing anything related to the distance.

Thanks a lot, I hope everything is clear.

Ken White
  • 123,280
  • 14
  • 225
  • 444
Alexandra
  • 3
  • 2
  • Please read the tag descriptions before deciding to use them. *pdb* has a totally different meaning than what you think, and the description informs people exactly which tag to use instead if needed. Tags here have specific meanings and use, so please choose carefully. Thanks. – Ken White Nov 22 '16 at 00:54
  • Thanks, I'll pay more attention next time. – Alexandra Nov 22 '16 at 01:14

1 Answers1

0

Would strongly suggest using C libraries while calculating distances. I use mdtraj for this sort of thing and it works much quicker than all the for loops in BioPython.

To get all pairs of alpha-Carbons:

import mdtraj as md
def get_CA_pairs(self,pdbfile):
  traj = md.load_pdb(pdbfile)
  topology = traj.topology 
  CA_index = ([atom.index for atom in topology.atoms if (atom.name == 'CA')])
  pairs=list(itertools.combinations(CA_index,2))
return pairs

Then, for quick computation of distances:

  def get_distances(self,pdbfile,pairs):
  #returns list of resid1, resid2,distances between CA-CA
   traj = md.load_pdb(pdbfile)
   pairs=self.get_CA_pairs(pdbfile)
   dist=md.compute_distances(traj,pairs)
#make dictionary you desire. 
   dict=dict(zip(CA, pairs))
  return dict

This includes all alpha-Carbons. There should be a chain identifier too in mdtraj to select CA's from each chain.

sridharn
  • 111
  • 7