I am training a random forest with scikit-learn on Morgan fingerprints and would like to know which structural motifs are most important. For that I would like to draw all fragments that produce an on-bit in the x most important features.
I have found the Draw.DrawMorganBits
module in the new release and these examples for usage:
https://iwatobipen.wordpress.com/2018/11/07/visualize-important-features-of-machine-leaning-rdkit/
http://rdkit.blogspot.com/2018/10/using-new-fingerprint-bit-rendering-code.html
However, I don't know how to produce a unique set of fragments. Previously I went through my test set, collected the bitinfo and molecular environments and created SMILES with Chem.MolFragmentToSmiles
. Then I created mols from a set of these SMILES and plotted them. However, this is a weak representation of the environment and some fragments cannot be plotted.
I can provide my old code. It follows the old documentation https://rdkit.readthedocs.io/en/release_2017_03_1/GettingStartedInPython.html#explaining-bits-from-morgan-fingerprints