Using RDKit to calcutate Tanimoto similarity between sdf file and a structure SMILE?

Question

I'm using RDKIt with Python 3.7 to calculate the similarity of a database in sdf (smile of every structure) with a molecule, of which i have the smile. I found a way to calculate Tanimoto index only between two SMILES using this code:

import numpy as np 
import scipy
import matplotlib
import matplotlib.pyplot as plt
import rdkit as  rd
from rdkit import Chem

ref = Chem.MolFromSmiles('Nc1nc2nc(N)nc(N)c2nc1-c1cccc(Cl)c1')
mol1 = Chem.MolFromSmiles('structure smiles')
fp1 = Chem.RDKFingerprint(ref)
fp2 = Chem.RDKFingerprint(mol1)

Tan  =DataStructs.TanimotoSimilarity(fp1,fp2)

print (Tan)

Is there a way to substitute mol1 with a sdf file?

score 2 · Answer 1 · answered Sep 15 '19 at 16:28

You can iterate over a SDF with SDMolSupplier.

from rdkit import Chem, DataStructs

ref = Chem.MolFromSmiles('Nc1nc2nc(N)nc(N)c2nc1-c1cccc(Cl)c1')
fp1 = Chem.RDKFingerprint(ref)

suppl = Chem.SDMolSupplier('yourSDF.sdf')
for mol in suppl:
    fp2 = Chem.RDKFingerprint(mol)
    Tan = DataStructs.TanimotoSimilarity(fp1,fp2)
    print(Tan)

Using RDKit to calcutate Tanimoto similarity between sdf file and a structure SMILE?

1 Answers1