2

In the paper: "Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals", authors introduce chirality as an atom feature input to analyze QM9 dataset. I was trying to recreate this atom feature as following

Chirality: (categorical) R, S, or not a Chiral center (one-hot encoded).

The code I used is:

from chainer_chemistry import datasets
from chainer_chemistry.dataset.preprocessors.ggnn_preprocessor import GGNNPreprocessor
from rdkit import Chem
import numpy as np


dataset, dataset_smiles = datasets.get_qm9(GGNNPreprocessor(), return_smiles=True)

for i in range(len(dataset_smiles)):
    mol = Chem.MolFromSmiles(dataset_smiles[i])
    Chem.AssignAtomChiralTagsFromStructure(mol)
    chiral_cc = Chem.FindMolChiralCenters(mol)

    if not len(chiral_cc) == 0:
        print(chiral_cc)

The output shows no Chiral centers for this dataset. When I use includeUnassigned=True, code gives a list of tuples, but instead of "R/S", I get "?". I was wondering if there is a mistake in my implementation. If this is expected, any thoughts on how chirality was assigned in the above paper?

Blade
  • 984
  • 3
  • 12
  • 34
  • 1
    can you post the doi for the paper – DarrenRhodes Jun 15 '20 at 12:33
  • @user1945827 Just linked doi to the title. – Blade Jun 15 '20 at 12:59
  • @Blade Did you find an answer? Also, this question might have been better suited to the chemistry stack exchange – Polydynamical Dec 18 '21 at 17:34
  • 1
    @Polydynamical you need 3D structure of the atoms in order to be able to compute them. I assume that the dataset only had SMILES representations. You should be able to introduce chirality with (if I remember correctly) '@' sign, but that was not the case in this dataset AFAIR. – Blade Dec 18 '21 at 18:36

1 Answers1

3

The reason you see '?' instead of R/S is because FindMolChiralCenters outputs '?' for unassigned stereocenters.

That is, if the SMILES does not assign the configuration like [C@] for S configuration or [C@@] for R configuration, then it is considered unassigned.

I ran your code and all the SMILES in the dataset do not have any assigned stereocenters.

For example I list some of the SMILES and its configuration here:

CC1C(C=O)N2CC12C
[(1, '?'), (2, '?'), (5, '?'), (7, '?')]
CC1N(C=O)C2CC12C
[(1, '?'), (5, '?'), (7, '?')]
CC1N(C=O)C2CC12O
[(1, '?'), (5, '?'), (7, '?')]
CN1C(C=O)C2CC21C
[(2, '?'), (5, '?'), (7, '?')]
O=CC1C(O)C2(O)CC12
[(2, '?'), (3, '?'), (5, '?'), (8, '?')]
CC1C(=O)C=CC1C=O
[(1, '?'), (6, '?')]
O=CC1C=CC(=O)C1O
[(2, '?'), (7, '?')]
C#CC1C=CC(=O)C1C
[(2, '?'), (7, '?')]

So as you can see in the SMILES, none of them have either [C@] or [C@@]. That is why you see '?' in configuration.

Also helpful rdkit documentation regarding FindMolChiralCenters can be found here

Vandan Revanur
  • 459
  • 6
  • 17