Converting molecule name to SMILES?

Question

I was just wondering, is there any way to convert IUPAC or common molecular names to SMILES? I want to do this without having to manually convert every single one utilizing online systems. Any input would be much appreciated!

For background, I am currently working with python and RDkit, so I wasn't sure if RDkit could do this and I was just unaware. My current data is in the csv format.

Thank you!

([Text Munging](https://docs.python.org/3/library/re.html#text-munging)?) — greybeard, Feb 28 '19 at 16:36

rapelpy · Accepted Answer · 2021-05-29T05:37:44.330

18

RDKit cant convert names to SMILES. Chemical Identifier Resolver can convert names and other identifiers (like CAS No) and has an API so you can convert with a script.

from urllib.request import urlopen
from urllib.parse import quote

def CIRconvert(ids):
    try:
        url = 'http://cactus.nci.nih.gov/chemical/structure/' + quote(ids) + '/smiles'
        ans = urlopen(url).read().decode('utf8')
        return ans
    except:
        return 'Did not work'

identifiers  = ['3-Methylheptane', 'Aspirin', 'Diethylsulfate', 'Diethyl sulfate', '50-78-2', 'Adamant']

for ids in identifiers :
    print(ids, CIRconvert(ids))

Output

3-Methylheptane CCCCC(C)CC
Aspirin CC(=O)Oc1ccccc1C(O)=O
Diethylsulfate CCO[S](=O)(=O)OCC
Diethyl sulfate CCO[S](=O)(=O)OCC
50-78-2 CC(=O)Oc1ccccc1C(O)=O
Adamant Did not work

edited May 29 '21 at 05:37

answered Feb 28 '19 at 18:29

rapelpy

1,684
1
11
14

For some reason this website is not operating properly since circa late 2020 – Cody Aldaz Feb 04 '21 at 06:23
@CodyAldaz The website seems to have some problems, but most of the time, when I click on `Submit`, it works. However the API works. – rapelpy Feb 04 '21 at 12:19
1

this mostly worked for me, but I had to just convert spaces to URL format (%20), such that: current_id = str(ids.lower()).replace(' ', '%20') url = 'http://cactus.nci.nih.gov/chemical/structure/' + current_id + '/smiles' – Paul G May 29 '21 at 03:10
1

@PaulG Thank you for pointing out the spaces. I have edited the code. – rapelpy May 29 '21 at 05:42

score 3 · Answer 2 · answered Mar 16 '19 at 11:07

3

OPSIN (https://opsin.ch.cam.ac.uk/) is another solution for name2structure conversion.

It can be used by installing the cli, or via https://github.com/gorgitko/molminer

(OPSIN is used by the RDKit KNIME nodes also)

answered Mar 16 '19 at 11:07

JoshuaBox

735
1
4
16

score 3 · Answer 3 · answered May 06 '22 at 18:03

PubChemPy has some great features that can be used for this purpose. It supports IUPAC systematic names, trade names and all known synonyms for a given Compound as documented in PubChem database: https://pubchempy.readthedocs.io/en/latest/

>>> import pubchempy as pcp
>>> results = pcp.get_compounds('Glucose', 'name')
>>> print results
[Compound(79025), Compound(5793), Compound(64689), Compound(206)]

The first argument is the identifier, and the second argument is the identifier type, which must be one of name, smiles, sdf, inchi, inchikey or formula. It looks like there are 4 compounds in the PubChem Database that have the name Glucose associated with them. Let’s take a look at them in more detail:

>>> for compound in results:
>>>     print compound.isomeric_smiles

C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O
C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O
C([C@@H]1[C@H]([C@@H]([C@H]([C@@H](O1)O)O)O)O)O
C(C1C(C(C(C(O1)O)O)O)O)O

It looks like they all have different stereochemistry information !

score 0 · Answer 4 · answered Feb 04 '21 at 23:27

0

The accepted answer uses the Chemical Identifier Resolver but for some reason the website seems to be buggy for me and the API seems to be messed up.

So another way to connvert smiles to IUPAC name is with the the PubChem python API, which can work if your smiles is in their database

e.g.

#!/usr/bin/env python

import sys    
import pubchempy as pcp

smiles = str(sys.argv[1])
print(smiles)
s= pcp.get_compounds(smiles,'smiles')
print(s[0].iupac_name)

answered Feb 04 '21 at 23:27

Cody Aldaz

160
7

1

The question was about converting name to smiles (not other way around). It can be done using this API as well: smiles= pcp.get_compounds(ids,'name')[0].canonical_smiles – Guy s Sep 01 '21 at 12:31
What about we don't have any id and just have name of compound? – Mohamad Kouhi Moghadam Jan 18 '22 at 04:48

score 0 · Answer 5 · answered Jan 18 '22 at 04:58

0

You can use batch query of pubchem:

answered Jan 18 '22 at 04:58

Mohamad Kouhi Moghadam

439
1
5
14

Paul G · Answer 6 · 2022-05-25T02:30:10.110

You can use the pubchem API (PUG REST) for this

(https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest-tutorial)

Basically, the url you are calling will take the compound as a "name", you then give the name, then you specify that you want the "property" of "CanonicalSMILES", as text

identifiers  = ['3-Methylheptane', 'Aspirin', 'Diethylsulfate', 'Diethyl sulfate', '50-78-2', 'Adamant']
smiles_df = pd.DataFrame(columns = ['Name', 'Smiles'])
for x in identifiers :
    try:
        url = 'https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/' + x + '/property/CanonicalSMILES/TXT'
#         remove new line character with rstrip
        smiles = requests.get(url).text.rstrip()
        if('NotFound' in smiles):
            print(x, " not found")
        else: 
            smiles_df = smiles_df.append({'Name' : x, 'Smiles' : smiles}, ignore_index = True)
    except: 
        print("boo ", x)
print(smiles_df)

Converting molecule name to SMILES?

6 Answers6