1

I made a model to predict molecules' solubility from their morgan fingerprint and now I have found the specific bits of fingerprints the model had a hard time predicting. I would like to see what each bit of a fingerprint correlates to in structure of the molecule and thanks to the user rapelpy I found DrawMorganBits, but here I need the mol (or Smiles) of the molecule and I only have the fingerprints of a non-specific molecule.

Is it possible to either get the mol or smiles code from fingerprints or can I draw the structures just with the fingerprints some other way?

Thanks in advance.

KBJ
  • 11
  • 4

2 Answers2

2

You can use DrawMorganBit() as described in the RDKit-Blog

rapelpy
  • 1,684
  • 1
  • 11
  • 14
0

If you only have a molecular fingerprint, it is difficult to track back to the substructure that caused each bit to be set – and may even be impossible depending on which fingerprint you are using.

In the above RDKit blog, the bitInfo dict is capturing the substructure responsible for a bit being set prior to "folding"/"hashing" the fingerprint. The process of hashing causes bit collisions and so it is not possible to map back deterministically without having this dictionary in the first place.


If you have the willpower and keeping track of the bitInfo is really not possible, you could try generating structures (or randomly sampling structures) which set the bit you are interested in, this will allow you to guess which substructures may have originally been responsible.

A place to start might be the GuacaMol benchmark codebase, which includes tasks and baseline methods that can generate molecules from their fingerprints.

JoshuaBox
  • 735
  • 1
  • 4
  • 16