I have a table that looks like this:
and I want to calculate Tanimoto coefficient (Molecular similarity measure) by RDkit in python in order to have below result:
but I failed.
My data:
{'name': ['16β-hydro-ent-kauran-17-oic acid ',
'16α-hydro-entkauran-17-oic acid ',
'ent-kaur-16-en-19-oic acid',
'16β,17-dihydroxy-ent-kauran-19-oic acid ',
'annomontacin'],
'canonical_smile': ['CC1(CCCC2(C1CCC34C2CCC(C3)C(C4)C(=O)O)C)C',
'CC1(CCCC2(C1CCC34C2CCC(C3)C(C4)C(=O)O)C)C',
'CC12CCCC(C1CCC34C2CCC(C3)C(=C)C4)(C)C(=O)O',
'CC12CCCC(C1CCC34C2CCC(C3)C(C4)(CO)O)(C)C(=O)O',
'CCCCCCCCCCCCC(C1CCC(O1)C(CCCCCCC(CCCCCC(CC2=CC(OC2=O)C)O)O)O)O']}
Here is my code:
import pandas as pd
import itertools
import matplotlib.pyplot as plt
from rdkit import Chem, DataStructs
from rdkit.Chem import (
PandasTools,
Draw,
Descriptors,
MACCSkeys,
rdFingerprintGenerator)
# Create two columns (SMILEs) from the combination of one column (SMILEs).
df3 = pd.DataFrame(list(itertools.combinations(df['canonical_smile'].unique(), 2)),
columns=['canonical_smile1', 'canonical_smile2']).dropna()
# Create two columns ROMoL objects from two columns (SMILEs).
PandasTools.AddMoleculeColumnToFrame(df3,'canonical_smile1','ROMol1',includeFingerprints=True)
PandasTools.AddMoleculeColumnToFrame(df3,'canonical_smile2','ROMol2',includeFingerprints=True)
# Calculate the circular Morgan fingerprints of two columns ROMoL objects
df3["morgan1"] = rdFingerprintGenerator.GetFPs(df3["ROMol1"].tolist())
df3["morgan2"] = rdFingerprintGenerator.GetFPs(df3["ROMol2"].tolist())
# Add the Tanimoto similarities between the Morgan fingerprints.
df3["tanimoto_morgan"] = DataStructs.BulkTanimotoSimilarity(df3["morgan1"], df3["morgan2"])
and this is my error:
ArgumentError: Python argument types in
rdkit.DataStructs.cDataStructs.BulkTanimotoSimilarity(Series, Series)
did not match C++ signature:
BulkTanimotoSimilarity(class RDKit::SparseIntVect<unsigned __int64> v1, class boost::python::list v2, bool returnDistance=False)
BulkTanimotoSimilarity(class RDKit::SparseIntVect<unsigned int> v1, class boost::python::list v2, bool returnDistance=False)
BulkTanimotoSimilarity(class RDKit::SparseIntVect<__int64> v1, class boost::python::list v2, bool returnDistance=False)
BulkTanimotoSimilarity(class RDKit::SparseIntVect<int> v1, class boost::python::list v2, bool returnDistance=False)
BulkTanimotoSimilarity(class ExplicitBitVect const * __ptr64 bv1, class boost::python::api::object bvList, bool returnDistance=0)
BulkTanimotoSimilarity(class SparseBitVect const * __ptr64 bv1, class boost::python::api::object bvList, bool returnDistance=0)