Retrieve all molecules from smiles file

Question

I am working on .smiles files. File structure of .smiles file is : http://en.wikipedia.org/wiki/Chemical_file_format#SMILES

I want to get all the atoms from the smiles file. It means that If there is single 'C' atom it means that there will be 4 'H' atoms will be connected to them.

I found while searching that there are some modules in python which can parse the smiles format but they do not give the supported hydrogen atoms. (for example : they only give 'C' and not other 4 'H' atoms connected to that 'C' atom)

How can I find all the atoms including the connected 'H' atoms as well using python.
Example of smiles file which needs to be converted in to all atoms including connected 'H' atoms:

[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]

Thank you in advance.

it's actually better to ask this question in bluobelisk http://blueobelisk.shapado.com/ — chupvl, Feb 12 '13 at 15:56

pradyunsg · Answer 1 · 2013-02-15T06:26:31.087

6

See Open Babel.

Useful Links on Open Babel Site

See Also,
This blog (by Casper Steinmann) on Chemistry with Python (using Open Babel, not all though)

Update See this code(untested):

mymol = pybel.readstring("smi",  
"[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)" + \
"N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H")
print mymol.addh()

edited Feb 15 '13 at 06:26

answered Feb 12 '13 at 09:38

pradyunsg

18,287
11
43
96

is this the output you want? `OC[C@H]1C=C[C@@H](n2cnc3c(nc(nc23)N)NC2CC2)C1 ` – pradyunsg Feb 15 '13 at 06:09

score 3 · Answer 2 · answered Feb 12 '13 at 07:07

3

I want to get all the atoms from the smiles file. It means that If there is single 'C' atom it means that there will be 4 'H' atoms will be connected to them. This assumption is not correct, it can be 1,2,3 hydrogens.

Try, openbabel, CDK or similar library for cheminformatics.

But, why do you need all atoms from the file?

answered Feb 12 '13 at 07:07

chupvl

1,258
2
12
20

i want because i want to find the atomic weight from it. i also want to find out donor and acceptors. – sam Feb 12 '13 at 08:03
@sam He (sam) seems to want the atomic mass of the most stable atomic configration(not the right word here). If that's the case, 1 'C' will have to bond with 4 'H', to be stable. Is that so sam?... – pradyunsg Feb 12 '13 at 09:50
yes but that 4 H does not be present in the smiles file format. so i have to calculate such H and then i want to calculate the mass. – sam Feb 12 '13 at 10:18
openbabel have python bindings (pybel) and also CDK - http://pycdk.sourceforge.net/ – chupvl Feb 12 '13 at 15:55

score 3 · Answer 3 · answered Jun 12 '13 at 07:24

3

For the molecular weight of a compound, given as SMILES, the Python bindings of Openbabel should work:

import pybel
mol = pybel.readfile("smi", "stuff.smi").next()
print mol.molwt

answered Jun 12 '13 at 07:24

Klaus-Dieter Warzecha

2,265
2
27
33

score 2 · Answer 4 · answered Feb 12 '13 at 09:05

2

Try frowns, a chemoinformatics toolkit geared toward rapid development of chemistry related algorithms. It is written in almost 100% Python with a small portion written in C++.

answered Feb 12 '13 at 09:05

BioGeek

21,897
23
83
145

frowns seems to only be for Python 2.2(old), and it also needs vflib(not a problem). Also, the project seems dead since 2004(old,Big Problem). – pradyunsg Feb 12 '13 at 09:54

Jayaram · Answer 5 · 2021-03-14T14:06:24.730

0

RDKIT is a well defined cheminformatics library in python.

To read a molecule from smiles,

from rdkit import Chem

m = Chem.MolFromSmiles('[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]')

After you read in the smiles into an RDKIT molecule you can pretty much do everything. documentation - http://www.rdkit.org

edited Mar 14 '21 at 14:06

answered Feb 11 '14 at 15:19

Jayaram

839
1
14
24

Retrieve all molecules from smiles file

5 Answers5