3

I am working on .smiles files. File structure of .smiles file is : http://en.wikipedia.org/wiki/Chemical_file_format#SMILES

I want to get all the atoms from the smiles file. It means that If there is single 'C' atom it means that there will be 4 'H' atoms will be connected to them.

I found while searching that there are some modules in python which can parse the smiles format but they do not give the supported hydrogen atoms. (for example : they only give 'C' and not other 4 'H' atoms connected to that 'C' atom)

How can I find all the atoms including the connected 'H' atoms as well using python.
Example of smiles file which needs to be converted in to all atoms including connected 'H' atoms:

[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]

Thank you in advance.

BioGeek
  • 21,897
  • 23
  • 83
  • 145
sam
  • 18,509
  • 24
  • 83
  • 116

5 Answers5

6

See Open Babel.

Useful Links on Open Babel Site

See Also,
This blog (by Casper Steinmann) on Chemistry with Python (using Open Babel, not all though)

Update See this code(untested):

mymol = pybel.readstring("smi",  
"[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)" + \
"N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H")
print mymol.addh()
pradyunsg
  • 18,287
  • 11
  • 43
  • 96
3

I want to get all the atoms from the smiles file. It means that If there is single 'C' atom it means that there will be 4 'H' atoms will be connected to them. This assumption is not correct, it can be 1,2,3 hydrogens.

Try, openbabel, CDK or similar library for cheminformatics.

But, why do you need all atoms from the file?

chupvl
  • 1,258
  • 2
  • 12
  • 20
  • i want because i want to find the atomic weight from it. i also want to find out donor and acceptors. – sam Feb 12 '13 at 08:03
  • @sam He (sam) seems to want the atomic mass of the most stable atomic configration(not the right word here). If that's the case, 1 'C' will have to bond with 4 'H', to be stable. Is that so sam?... – pradyunsg Feb 12 '13 at 09:50
  • yes but that 4 H does not be present in the smiles file format. so i have to calculate such H and then i want to calculate the mass. – sam Feb 12 '13 at 10:18
  • openbabel have python bindings (pybel) and also CDK - http://pycdk.sourceforge.net/ – chupvl Feb 12 '13 at 15:55
3

For the molecular weight of a compound, given as SMILES, the Python bindings of Openbabel should work:

import pybel
mol = pybel.readfile("smi", "stuff.smi").next()
print mol.molwt
Klaus-Dieter Warzecha
  • 2,265
  • 2
  • 27
  • 33
2

Try frowns, a chemoinformatics toolkit geared toward rapid development of chemistry related algorithms. It is written in almost 100% Python with a small portion written in C++.

BioGeek
  • 21,897
  • 23
  • 83
  • 145
  • frowns seems to only be for Python 2.2(old), and it also needs vflib(not a problem). Also, the project seems dead since 2004(old,Big Problem). – pradyunsg Feb 12 '13 at 09:54
0

RDKIT is a well defined cheminformatics library in python.

To read a molecule from smiles,

from rdkit import Chem

m = Chem.MolFromSmiles('[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]')

After you read in the smiles into an RDKIT molecule you can pretty much do everything. documentation - http://www.rdkit.org

Jayaram
  • 839
  • 1
  • 14
  • 24