1

I am interested in obtaining fragments, or sub structures, that contain 4 non-hydrogen atoms within a larger molecule.

The closest example to accomplishing this is referenced from https://iwatobipen.wordpress.com/2020/08/12/get-and-draw-molecular-fragment-with-user-defined-path-rdkit-memo/

I used their work but I don't believe "ALL" 4 atom substructures are found within their methodology. They use the concept of "radius" around centered atoms. This doesn't distinguish all possible substructures I am looking for.

Does anyone have suggestions for improving this code to obtaining my goal or a differing direction to go for future work?

I followed the method presented by the link above. Various substructures are presented but not 'ALL' 4 atom fragments are specified when running this method.

Specific Example: Major Structure I am using as Eample

Substructure example I don't see using this methodology but I believe I should see at radii 1 or radii 2.1]1

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem
AllChem.SetPreferCoordGen(True)
def getSubmolRadN(mol, radius):
    atoms=mol.GetAtoms()
    submols=[]
    for atom in atoms:
        env=Chem.FindAtomEnvironmentOfRadiusN(mol, radius, atom.GetIdx())
        amap={}
        submol=Chem.PathToSubmol(mol, env, atomMap=amap)
        subsmi=Chem.MolToSmiles(submol, rootedAtAtom=amap[atom.GetIdx()], canonical=False)
        submols.append(Chem.MolFromSmiles(subsmi, sanitize=False))
    return submols
mol = Chem.MolFromSmiles('C=C(S)C(N)(O)C')
submols = getSubmolRadN(mol,1)
Draw.MolsToGridImage(submols, highlightAtomLists=[[0] for _ in range(len(submols))], molsPerRow=5)
submols = getSubmolRadN(mol,2)
Draw.MolsToGridImage(submols, highlightAtomLists=[[0] for _ in range(len(submols))], molsPerRow=5)
submols = getSubmolRadN(mol,3)
Draw.MolsToGridImage(submols, highlightAtomLists=[[0] for _ in range(len(submols))], molsPerRow=5)

Outputs for Radii 1 and 2, radii 3 I obtain an errorOutputs of radii 1 and 2

  • "that contain 4 non-hydrogen atoms within a larger molecule" this is not specific enough I guess. Do you want to have 4 non-hydrogen atoms within a specific radius within the molecule? Do you want to have 4 non-hydrogen atoms connected by one or multiple carbons? Right now I can give you code that checks whether or not your example structure has 4 non-hydrogen atoms. The output for your example structure would be True. Or do you want to find every Carbon atom that is connected to 4 non-hydrogen Atoms? – Tarquinius Apr 19 '23 at 14:33
  • Thank you for your questions. I am looking for all substructures that contain 4 atoms that are not hydrogen. A differing way to say this is explaining in terms of bonds. I would like all fragments that contain 3 bonds where hydrogen atoms are not apart of the 4 atoms or 3 bonded substructure. Regarding the radius question, I am not looking explicitly at "radius" more so all the substructures that contain 4 atoms and 3 bonds. In graph theory, generally hydrogens are ignored and the other atoms are counted as nodes which I am trying to exercise here. – meadeytabeedy Apr 19 '23 at 16:30
  • Additionally, I believe I found the function that does this for me. RDKit rdkit.Chem.rdmolops.FindAllSubgraphsOfLengthMToN – meadeytabeedy Apr 19 '23 at 16:43

2 Answers2

2

To find 4 connected non-hydrogen atoms you can use SMARTS.

Since a connection of 4 can be linear or have one branch and must not end with hydrogen (important if you have explicit hydrogen), these two SMARTS should work.

from rdkit import Chem
from rdkit.Chem import Draw, rdDepictor
rdDepictor.SetPreferCoordGen(True)
from rdkit.Chem.Draw import IPythonConsole
IPythonConsole.molSize = (200, 150)

smarts_linear = Chem.MolFromSmarts('[!#1]~*~*~[!#1]')
smarts_branch = Chem.MolFromSmarts('[!#1]~*(~[!#1])~[!#1]')

smarts_linear

enter image description here

smarts_branch

enter image description here

For better testing I will use a molecule with explicit hydrogen.

mol = Chem.AddHs(Chem.MolFromSmiles('C=C(S)C(N)(O)C'))
mol

enter image description here

sub_linear = mol.GetSubstructMatches(smarts_linear)
sub_branch = mol.GetSubstructMatches(smarts_branch)
all_subs = sub_linear + sub_branch

found_subs = []

for n in range(len(all_subs)):
    found_subs.append(mol)

Draw.MolsToGridImage(found_subs, molsPerRow=4, highlightAtomLists=all_subs, subImgSize=(200,150))

enter image description here

rapelpy
  • 1,684
  • 1
  • 11
  • 14
0

Previous user (rapelpy) answered the question as well. A team member of mine was able to generate a code as to obtain the paths or subgraphs. To clarify, paths don't have branching where subgraphs do.

from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem
AllChem.SetPreferCoordGen(True)
from rdkit.Chem import rdmolops
import csv

mol = Chem.MolFromSmiles('C=C(S)C(N)(O)C')
mol


# Find all subgraphs of length 3 in the molecule
#Subgraphs have branching
subgraphs = Chem.FindAllSubgraphsOfLengthN(mol, 3)

#Paths is no branching
#subgraphs = Chem.FindAllPathsOfLengthN(mol, 3)
print(len(subgraphs))

# Print out the connected SMILES for each subgraph
for subgraph in subgraphs:
    # Get the subgraph as a new molecule object
    sub_mol = Chem.PathToSubmol(mol, subgraph)
    # Generate the connected SMILES string for the subgraph
    subgraph_smiles = Chem.MolToSmiles(sub_mol, kekuleSmiles=True)
    print(subgraph_smiles)

Output
11
C=CCC
C=CCO
C=CCN
C=C(C)S
CCCS
OCCS
NCCS
CC(C)O
CC(C)N
CC(N)O
CC(N)O