1

I am working with RDKIT and am using an algorithm to randomly generate Morgan fingerprints all 2048 bits. I am wondering if there’s a way to trace back the fingerprint to somehow figure out what molecule it is, whether it’s a smiles string, name, etc. Thanks!

Me4836
  • 42
  • 5
  • I am not really sure what you are asking here, are you creating random bit vectors or generating from random molecules? If the prior, the answer is no this is more or less impossible. A fingerprint contains information about substructure elements but not how these connect to each other. Also a bit set in a fixed-length fingerprint may correspond to multiple substructures due to the 'folding' process, making it even more difficult to figure out what the molecule is. – Oliver Scott Jan 11 '21 at 10:59
  • It may be worth sharing some code so we can be of more help. – Oliver Scott Jan 11 '21 at 11:12

3 Answers3

2

A couple of points on this:

  1. Morgan fingerprints are not a unique representation of a molecule. Due to bit-collisions many molecules can theoretically produce the same fingerprint.

  2. However, Morgan fingerprints with 2048 bits are quite sparse and so the chances of collision are reduced. A notable exception would be polymers (repeating units cause the same bits to be set, so a trimer and a dimer would look identical in terms of their Morgan fingerprints)

  3. If you just want to discover a solution (not all solutions), there are many ways to reverse engineer a fingerprint. See discussion on the RDKit mailing list. And another similar discussion here (not reverse engineering Morgan, but a different ambiguous molecular representation)

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
JoshuaBox
  • 735
  • 1
  • 4
  • 16
1

No these fingerprints cannot be converted to molecules, information about the number and position of the 'structures' (of the 1-bits) are missing in these fingerprints. It is only possible to convert the 1-bits (bits which are 1 in the Morgan fingerprint) to structures by:

# Draw all real 1-bits
tpls = [(m,x,bi) for x in fp.GetOnBits()]
Draw.DrawMorganBits(tpls,molsPerRow=3, subImgSize=(400,400), legends=[str(x) for x in `fp.GetOnBits()])` 

As output you get the drawings of all 1-bits: enter image description here

John Mommers
  • 140
  • 7
0

To my knowledge there's no way to recover a chemical structure from a fingerprint. Fingerprints map all chemical structures to a fixed bit length, which implies bit collisions.

Furthermore, fingerprints only track the presence or absence of different substructures. Fingerprints don't tell you how many times a substructure is present, or how substructures are connected. So the fingerprint doesn't give you the information to reconstruct the initial molecule from the substructures.

You can use RDKit to see what substructures correspond with different bits in the fingerprint (see here).

My suggestion would be to create a class that holds both the SMILES string and the corresponding fingerprint so that information stays together

Karl
  • 961
  • 6
  • 10