2

I have the following code:

from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdDepictor
from rdkit.Chem.Draw import rdMolDraw2D
from IPython.display import SVG


m = Chem.MolFromSmiles('c1cc(C(=O)O)c(OC(=O)C)cc1')
substructure = Chem.MolFromSmarts('C(=O)O')
print(m.GetSubstructMatches(substructure))
m

Which produces the following plot.

enter image description here

However the code above doesn't produce the high resolution image. I'd like to have the SVG. I tried this:

drawer = rdMolDraw2D.MolDraw2DSVG(400,200)
drawer.DrawMolecule(m,highlightAtoms=m.GetSubstructMatch(Chem.MolFromSmarts('C(=O)O')))
drawer.FinishDrawing()
svg = drawer.GetDrawingText().replace('svg:','')
SVG(svg)

But I get:

enter image description here

What's the right way to do it?

The code can be tested in my Google Colab.

littleworth
  • 4,781
  • 6
  • 42
  • 76

1 Answers1

3

GetSubstructMatch returns only the first match. Use GetSubstructMatches. There are multiple scenarios here depending on the rdkit version you've installed. In the latest rdkit version (2021.09.2), the following code should work.

from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdDepictor
from rdkit.Chem.Draw import rdMolDraw2D
from IPython.display import SVG
from copy import deepcopy


def increase_resolution(mol, substructure, size=(400, 200)):
    mol = deepcopy(mol)
    substructure = deepcopy(substructure)
    drawer = rdMolDraw2D.MolDraw2DSVG(size[0], size[1])
    
    # highlightAtoms expects only one tuple, not tuple of tuples. So it needs to be merged into a single tuple
    matches = sum(mol.GetSubstructMatches(substructure), ())
    drawer.DrawMolecule(mol, highlightAtoms=matches)
    
    drawer.FinishDrawing()
    svg = drawer.GetDrawingText()
    
    return svg.replace('svg:','')


mol = Chem.MolFromSmiles('c1cc(C(=O)O)c(OC(=O)C)cc1')
substructure = Chem.MolFromSmarts('C(=O)O')
SVG(increase_resolution(mol, substructure))

If you're getting Value Error: Bad Conformer id error then either update the rdkit package to the latest version or try this:

from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdDepictor
from rdkit.Chem.Draw import rdMolDraw2D
from IPython.display import SVG
from copy import deepcopy


def increase_resolution(mol, substructure, size=(400, 200), kekulize=True):
    mol = deepcopy(mol)
    substructure = deepcopy(substructure)
    rdDepictor.Compute2DCoords(mol)
    if kekulize:
        Chem.Kekulize(mol) # Localize the benzene ring bonds
        
    drawer = rdMolDraw2D.MolDraw2DSVG(size[0], size[1])
    
    # highlightAtoms expects only one tuple, not tuple of tuples. So it needs to be merged into a single tuple
    matches = sum(mol.GetSubstructMatches(substructure), ())
    drawer.DrawMolecule(mol, highlightAtoms=matches)
    
    drawer.FinishDrawing()
    svg = drawer.GetDrawingText()
    return svg.replace('svg:','')


mol = Chem.MolFromSmiles('c1cc(C(=O)O)c(OC(=O)C)cc1')
substructure = Chem.MolFromSmarts('C(=O)O')
SVG(increase_resolution(mol, substructure, kekulize=True))

If for some cases like structures with chirality introduced in them as part of the SMILES string, it may fail to work. For such cases, set kekulize=False.

betelgeuse
  • 1,136
  • 3
  • 13
  • 25
  • Thanks a million. A slight off question. RDKit provides the [Fragment](http://rdkit.org/docs/source/rdkit.Chem.Fragments.html) identification from the molecule. With function that looks like `rdkit.Chem.Fragments.fr_Al_COO()`. Is there a way to highlight the pattern using your approach? – littleworth Oct 27 '21 at 12:01
  • 1
    I can't think of a straightforward way but I can suggest an indirect one. The above code needs a substructure for pattern matching. So you just need to find SMARTS string for each of these fragments and then iterate over them by calling the above code. Now, to get the SMARTS for these fragments, you can write a simple python function to parse the following file to extract the values https://github.com/rdkit/rdkit/blob/master/Data/FragmentDescriptors.csv – betelgeuse Oct 27 '21 at 12:14
  • I tried your code again with Benzene SMARTS (`c1ccccc1` ) on my molecule `[H]N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(O)=O`. It doesn't work. Please look at my [**GoogleColab**](https://colab.research.google.com/drive/1lpPg_MEEOkZhpCYbsSJCcInFCDwGCXyG#scrollTo=jdbngquUV8vr). it's editable. – littleworth Oct 28 '21 at 13:27
  • 1
    See my updated answer. – betelgeuse Oct 28 '21 at 17:57