2

I am trying to convert inchi to sdf format using rdkit python library. I am running following line of python code.

#convert inchi to sdf

def MolFromInchi(id,inchi):
    mol = Chem.MolFromInchi(inchi)
    mol_block = Chem.MolToMolBlock(mol)
    print (id, mol_block)
    print ('$$$$')
    
with open (r'C:/Users/inchi_canonize') as f:                                                                                   
    for line in f:
        lst=line.split(' ')
        elements = [x for x in lst if x]   #remove empty elements and get id (elements[0]) and inchis (elements[1])
        elements[1] = ('\''+elements[1].strip()+'\'')
        id = elements[0]
        inchi = elements[1].rstrip("\n")
        print (inchi)
        MolFromInchi(id,inchi)


The input file (inchi_canonize) has following fields.

D08520   InChI=1S/C10H18O2/c1-7-4-5-8(6-9(7)11)10(2,3)12/h4,8-9,11-12H,5-6H2,1-3H3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
D07548   InChI=1S/C17H25NO4.ClH/c1-20-13-11-15(21-2)17(16(12-13)22-3)14(19)7-6-10-18-8-4-5-9-18;/h11-12H,4-10H2,1-3H3;1H                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
D10000   (null)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     

Below is the error:

ArgumentError: Python argument types in
    rdkit.Chem.rdmolfiles.MolToMolBlock(NoneType)
did not match C++ signature:
    MolToMolBlock(class RDKit::ROMol mol, bool includeStereo=True, int confId=-1, bool kekulize=True, bool forceV3000=False)

Any help is highly appreciated

rshar
  • 1,381
  • 10
  • 28

1 Answers1

2

The problem is elements[1] = ('\''+elements[1].strip()+'\'').

The InChI is already a string and you add '" "' to it.

Your InChI is now "'InChI=1S/C10H18O2/c1-7-4-5-8(6-9(7)11)10(2,3)12/h4,8-9,11-12H,5-6H2,1-3H3'"

and not InChI=1S/C10H18O2/c1-7-4-5-8(6-9(7)11)10(2,3)12/h4,8-9,11-12H,5-6H2,1-3H3.

Additionally you should insert a check, because otherwise you try to convert also (null) to a molblock.

And by the way, you can use Chem.SDWriter for writting a SDF.

from rdkit import Chem

mols = []
ids = []
inchis = []

with open(r'D:\Z\inchi_canonize.txt') as f:                                                                                   
    for line in f:
        lst=line.split(' ')
        elements = [x for x in lst if x]
        inchi = elements[1].rstrip("\n")
        mol = Chem.MolFromInchi(inchi)
        if mol is not None:
            mols.append(mol)
            ids.append(elements[0])
            inchis.append(inchi)

w = Chem.SDWriter('foo.sdf')

for n in range(len(mols)):
    mols[n].SetProp('_Name', inchis[n]) # set a title line
    mols[n].SetProp('ID', ids[n] ) # set an associated data
    w.write(mols[n])

w.close()
rapelpy
  • 1,684
  • 1
  • 11
  • 14
  • Thanks a ton!! Could you explain what below line is doing. for n in range(len(mols)): mols[n].SetProp('_Name', idx[n]) w.write(mols[n]) – rshar May 01 '22 at 20:38
  • 1
    This line writes every molecule into foo.sdf and makes ID and Inchi the title line for every entry. – rapelpy May 02 '22 at 05:33
  • Is there a way to add a property before the end of a particular entry. for eg. ` > D08520 ` – rshar May 02 '22 at 09:44
  • @rshar I edited the code, so the ID is now an associated data and only the InChI is in the title line. – rapelpy May 02 '22 at 11:51