I am learning rdkit. At the moment I want to extract info from the ligand docked in the protein. The problem I face is that the bonds from the ligand are always returned as SINGLE, no matter there actual types, while the bond types of the protein are well returned.
Anybody know what I am missing?
For the below code example I am looking at the P38 kinase, 2GHL PDB file, and the ligand is names "LIB". But I observed the same issue with all the PDB files I tried.
My strategy is in 3 steps:
- Iterate through the atoms of the PDB file
- Store all the atoms with "LIB" as residue name in a list called ligand_atoms
- Iterate again through the atoms in the PDB file, using an embedded loop, only between the index of first and last atom of the ligand and print the atoms symbols, indexes and bond types.
from rdkit import Chem
# Specify the path to your PDB file
pdb_file = "path\\2ghl.pdb"
# Load the PDB file using the Chem module
mol = Chem.MolFromPDBFile(pdb_file)
# Create empty lists to store ligand and protein atoms
ligand_atoms = []
for i in range(len(mol.GetAtoms())):
residue_name = mol.GetAtomWithIdx(i).GetPDBResidueInfo().GetResidueName()
if residue_name == "LIB":
ligand_atoms.append(mol.GetAtomWithIdx(i))
# Find index of first and last atom of the ligand
first_atom_idx = ligand_atoms[0].GetIdx()
last_atom_idx = ligand_atoms[len(ligand_atoms)-1].GetIdx()
# Return bond type between each atom of the ligand if there is a bond
for i in range(first_atom_idx, last_atom_idx+1):
for j in range(i+1, last_atom_idx+1):
if mol.GetBondBetweenAtoms(i, j):
print(mol.GetAtomWithIdx(i).GetSymbol(), " ", i, ", ",
mol.GetAtomWithIdx(j).GetSymbol(), " ", j, ": ",
mol.GetBondBetweenAtoms(i, j).GetBondType())
Here is the output:
While the molecule contains double bonds and aromatic bonds: