11

I was just wondering, is there any way to convert IUPAC or common molecular names to SMILES? I want to do this without having to manually convert every single one utilizing online systems. Any input would be much appreciated!

For background, I am currently working with python and RDkit, so I wasn't sure if RDkit could do this and I was just unaware. My current data is in the csv format.

Thank you!

A. Y
  • 123
  • 1
  • 1
  • 5

6 Answers6

18

RDKit cant convert names to SMILES. Chemical Identifier Resolver can convert names and other identifiers (like CAS No) and has an API so you can convert with a script.

from urllib.request import urlopen
from urllib.parse import quote

def CIRconvert(ids):
    try:
        url = 'http://cactus.nci.nih.gov/chemical/structure/' + quote(ids) + '/smiles'
        ans = urlopen(url).read().decode('utf8')
        return ans
    except:
        return 'Did not work'

identifiers  = ['3-Methylheptane', 'Aspirin', 'Diethylsulfate', 'Diethyl sulfate', '50-78-2', 'Adamant']

for ids in identifiers :
    print(ids, CIRconvert(ids))

Output

3-Methylheptane CCCCC(C)CC
Aspirin CC(=O)Oc1ccccc1C(O)=O
Diethylsulfate CCO[S](=O)(=O)OCC
Diethyl sulfate CCO[S](=O)(=O)OCC
50-78-2 CC(=O)Oc1ccccc1C(O)=O
Adamant Did not work
rapelpy
  • 1,684
  • 1
  • 11
  • 14
  • For some reason this website is not operating properly since circa late 2020 – Cody Aldaz Feb 04 '21 at 06:23
  • @CodyAldaz The website seems to have some problems, but most of the time, when I click on `Submit`, it works. However the API works. – rapelpy Feb 04 '21 at 12:19
  • 1
    this mostly worked for me, but I had to just convert spaces to URL format (%20), such that: current_id = str(ids.lower()).replace(' ', '%20') url = 'http://cactus.nci.nih.gov/chemical/structure/' + current_id + '/smiles' – Paul G May 29 '21 at 03:10
  • 1
    @PaulG Thank you for pointing out the spaces. I have edited the code. – rapelpy May 29 '21 at 05:42
3

OPSIN (https://opsin.ch.cam.ac.uk/) is another solution for name2structure conversion.

It can be used by installing the cli, or via https://github.com/gorgitko/molminer

(OPSIN is used by the RDKit KNIME nodes also)

JoshuaBox
  • 735
  • 1
  • 4
  • 16
3

PubChemPy has some great features that can be used for this purpose. It supports IUPAC systematic names, trade names and all known synonyms for a given Compound as documented in PubChem database: https://pubchempy.readthedocs.io/en/latest/

>>> import pubchempy as pcp
>>> results = pcp.get_compounds('Glucose', 'name')
>>> print results
[Compound(79025), Compound(5793), Compound(64689), Compound(206)]

The first argument is the identifier, and the second argument is the identifier type, which must be one of name, smiles, sdf, inchi, inchikey or formula. It looks like there are 4 compounds in the PubChem Database that have the name Glucose associated with them. Let’s take a look at them in more detail:

>>> for compound in results:
>>>     print compound.isomeric_smiles

C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O
C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O
C([C@@H]1[C@H]([C@@H]([C@H]([C@@H](O1)O)O)O)O)O
C(C1C(C(C(C(O1)O)O)O)O)O

It looks like they all have different stereochemistry information !

San
  • 61
  • 7
0

The accepted answer uses the Chemical Identifier Resolver but for some reason the website seems to be buggy for me and the API seems to be messed up.

So another way to connvert smiles to IUPAC name is with the the PubChem python API, which can work if your smiles is in their database

e.g.

#!/usr/bin/env python

import sys    
import pubchempy as pcp

smiles = str(sys.argv[1])
print(smiles)
s= pcp.get_compounds(smiles,'smiles')
print(s[0].iupac_name)
Cody Aldaz
  • 160
  • 7
0

You can use the pubchem API (PUG REST) for this

(https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest-tutorial)

Basically, the url you are calling will take the compound as a "name", you then give the name, then you specify that you want the "property" of "CanonicalSMILES", as text

identifiers  = ['3-Methylheptane', 'Aspirin', 'Diethylsulfate', 'Diethyl sulfate', '50-78-2', 'Adamant']
smiles_df = pd.DataFrame(columns = ['Name', 'Smiles'])
for x in identifiers :
    try:
        url = 'https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/' + x + '/property/CanonicalSMILES/TXT'
#         remove new line character with rstrip
        smiles = requests.get(url).text.rstrip()
        if('NotFound' in smiles):
            print(x, " not found")
        else: 
            smiles_df = smiles_df.append({'Name' : x, 'Smiles' : smiles}, ignore_index = True)
    except: 
        print("boo ", x)
print(smiles_df)

enter image description here

Paul G
  • 829
  • 7
  • 11