On finding chirality using RDKit

Question

In the paper: "Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals", authors introduce chirality as an atom feature input to analyze QM9 dataset. I was trying to recreate this atom feature as following

Chirality: (categorical) R, S, or not a Chiral center (one-hot encoded).

The code I used is:

from chainer_chemistry import datasets
from chainer_chemistry.dataset.preprocessors.ggnn_preprocessor import GGNNPreprocessor
from rdkit import Chem
import numpy as np


dataset, dataset_smiles = datasets.get_qm9(GGNNPreprocessor(), return_smiles=True)

for i in range(len(dataset_smiles)):
    mol = Chem.MolFromSmiles(dataset_smiles[i])
    Chem.AssignAtomChiralTagsFromStructure(mol)
    chiral_cc = Chem.FindMolChiralCenters(mol)

    if not len(chiral_cc) == 0:
        print(chiral_cc)

The output shows no Chiral centers for this dataset. When I use includeUnassigned=True, code gives a list of tuples, but instead of "R/S", I get "?". I was wondering if there is a mistake in my implementation. If this is expected, any thoughts on how chirality was assigned in the above paper?

@Blade Did you find an answer? Also, this question might have been better suited to the chemistry stack exchange — Polydynamical, Dec 18 '21 at 17:34
@Polydynamical you need 3D structure of the atoms in order to be able to compute them. I assume that the dataset only had SMILES representations. You should be able to introduce chirality with (if I remember correctly) '@' sign, but that was not the case in this dataset AFAIR. — Blade, Dec 18 '21 at 18:36

score 3 · Answer 1 · answered Feb 21 '23 at 20:20

The reason you see '?' instead of R/S is because FindMolChiralCenters outputs '?' for unassigned stereocenters.

That is, if the SMILES does not assign the configuration like [C@] for S configuration or [C@@] for R configuration, then it is considered unassigned.

I ran your code and all the SMILES in the dataset do not have any assigned stereocenters.

For example I list some of the SMILES and its configuration here:

CC1C(C=O)N2CC12C
[(1, '?'), (2, '?'), (5, '?'), (7, '?')]
CC1N(C=O)C2CC12C
[(1, '?'), (5, '?'), (7, '?')]
CC1N(C=O)C2CC12O
[(1, '?'), (5, '?'), (7, '?')]
CN1C(C=O)C2CC21C
[(2, '?'), (5, '?'), (7, '?')]
O=CC1C(O)C2(O)CC12
[(2, '?'), (3, '?'), (5, '?'), (8, '?')]
CC1C(=O)C=CC1C=O
[(1, '?'), (6, '?')]
O=CC1C=CC(=O)C1O
[(2, '?'), (7, '?')]
C#CC1C=CC(=O)C1C
[(2, '?'), (7, '?')]

So as you can see in the SMILES, none of them have either [C@] or [C@@]. That is why you see '?' in configuration.

Also helpful rdkit documentation regarding FindMolChiralCenters can be found here

On finding chirality using RDKit

1 Answers1