1

I do have two questions about the Morgan fingerprint function of RDKit. I couldn't figure out whether a Morgan fingerprint with the radius 2 or 4 corresponds to the ECFP4. Furthermore I couldn't figure out, why the calculated similarity between two molecules differs substantially (much smaller) when using GetMorganFingerprintAsBitVect(nBits=2048) instead of GetMorganFingerprint? Help or explanations would be very much appreciated. Kind regards Philipp

Philipp O.
  • 41
  • 1
  • 4
  • Please provide more details about your second question. What commands did you run? What were the outputs? How did you calculate the similarities, etc. – betelgeuse Jul 13 '21 at 12:46

1 Answers1

1

In answer to your first question, according to https://www.rdkit.org/docs/GettingStartedInPython.html, a radius of 2 is roughly equivalent to ecfp4.

The default atom invariants use connectivity information similar to those used for the well known ECFP family of fingerprints. Feature-based invariants, similar to those used for the FCFP fingerprints, can also be used. The feature definitions used are defined in the section Feature Definitions Used in the Morgan Fingerprints. At times this can lead to quite different similarity scores:

m1 = Chem.MolFromSmiles('c1ccccn1')
m2 = Chem.MolFromSmiles('c1ccco1')
fp1 = AllChem.GetMorganFingerprint(m1,2)
fp2 = AllChem.GetMorganFingerprint(m2,2)
ffp1 = AllChem.GetMorganFingerprint(m1,2,useFeatures=True)
ffp2 = AllChem.GetMorganFingerprint(m2,2,useFeatures=True)
DataStructs.DiceSimilarity(fp1,fp2)
0.36...
DataStructs.DiceSimilarity(ffp1,ffp2)
0.90... 

When comparing the ECFP/FCFP fingerprints and the Morgan fingerprints generated by the RDKit, remember that the 4 in ECFP4 corresponds to the diameter of the atom environments considered, while the Morgan fingerprints take a radius parameter. So the examples above, with radius=2, are roughly equivalent to ECFP4 and FCFP4.

technomage
  • 9,861
  • 2
  • 26
  • 40