3

I need to work out information about the structure of branched alkanes using the IUPAC (International Union of Pure and Applied Chemistry) name in Excel. I have broken the problem into steps, and I have included examples for each step. Help with any of the steps would be greatly appreciated. I need to automate the method because I potentially have thousands of these names to process.

Step 1

I need work out the number of carbon atoms in the main chain. By convention, the name of the main chain appears last in the overall name. For example, the main chain in 2,2,3-trimethyloctane is octane. Could regular expressions be used to look for the name of main chain and then be used to look up the carbon atoms in the main chain? Further examples are provided in the table below.

Branched alkane name Main chain Carbon atoms in the main chain
2-methylpropane Propane 3
2-methylbutane Butane 4
3-ethylpentane Pentane 5
3-methylhexane Hexane 6
4-propulheptane Heptane 7
2-methyloctane Octane 8
3-ethylnonane Nonane 9
5-methyldecane Decane 10
6-butylundecane Undecane 11
3-ethyldodecane Dodecane 12

The custom UDF in Excel could be named MainChainRegex and =MainChainRegex(2,2,3-trimethyloctane) would return a value of 8.

Step 2

I also need work out the number of side-chain carbon atoms attached to each of the main chain carbon atoms. For example, 3-ethylpentane has two sidechain carbon atoms attached to the third carbon atom in the main chain, see image below. The “ethyl” term refers to a sidechain containing two carbon atoms. The “3-” in front of “ethyl” tells us that the sidechain is attached to carbon atom 3 in the main chain.

3-ethylpentane

The custom UDF in Excel could be named SideChainRegex and =SideChainRegex(3-ethylpentane) would return an array of 0, 0, 2, 0, 0 in five cells. The number of carbon atoms in the main chain could be used to size the array, and the first and last values in the array will always be zero.

Step 3

The problem becomes more complicated when there are multiple sidechains. For example, 2,2-dimethylbutane has two sidechains containing a combined total of two carbon atoms, see image below. The “methyl” term refers to a sidechain containing one carbon atom. The “di” refers to the fact that there are two of these sidechains and means that we will need to look for two numbers proceeding the “dimethyl” term. The “2,2-” in front of “dimethyl” tells us that both sidechains are attached to carbon atom 2 in the main chain. If the custom UDF in Excel is named SideChainRegex then =SideChainRegex(2,2-dimethylbutane) would need to return an array of 0, 2, 0, 0 in four cells – again the array is sized by the number of carbon atoms in the main chain.

2,2-dimethylbutane

Step 4

A further complication is when there is more than one type of side chain. For example, 5-ethyl-2,2-dimethylheptane has two different types of sidechains, see image below. If the custom UDF in Excel is named SideChainRegex then =SideChainRegex(5-ethyl-2,2-dimethylheptane) would need to return an array of 0, 2, 0, 0, 2, 0, 0 in seven cells to total up the number of side-chain carbon atoms attached to each of the main chain carbon atoms.

5-ethyl-2,2-dimethylheptane

The tables below provide further information that might be helpful.

Term How many numbers to look for before term
Di 2
Tri 3
Tetra 4
Penta 5
Hexa 6
Hepta 7
Octa 8
Term Number of carbon atoms in sidechain
Methyl 1
Ethyl 2
Propyl 3
Butyl 4
Pentyl 5
Hexyl 6
Heptyl 7
Octyl 8
Daniel
  • 57
  • 9
  • 1
    FYI PubChem has a REST API you can use to look up a name and get the molecular formula. Eg: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/butane/property/MolecularFormula/JSON – Tim Williams May 05 '23 at 16:37
  • Thre is a reg expression COM object. So you can use VBA to use regular expressions. Look for VBScript Regular Expressions in the reference browser. The chemical thing is up to you ;) – Thomas Ludewig May 05 '23 at 16:46

1 Answers1

3

EDIT: did not read your question fully and jumped to "total number of carbon atoms"... You might instead try fetching the SMILES and counting the C inside/outside of parentheses.

Here's an example of using the PubChem API to get MF from IUPAC name:

Sub Tester()
    Debug.Print IUPACToSmiles("2,2-dimethylbutane")
End Sub

Function IUPACToSmiles(IUPAC As String) As String
    Const SVC_URL As String = "https://pubchem.ncbi.nlm.nih.gov/rest/pug/" & _
                    "compound/name/<iupac>/property/CanonicalSMILES/TXT"
    Dim json As Object, objhttp As Object
    
    Set objhttp = CreateObject("MSXML2.XMLHTTP")
    objhttp.Open "Get", Replace(SVC_URL, "<iupac>", URLEncode(IUPAC)), False
    objhttp.Send
    IUPACToSmiles = objhttp.responsetext
    Debug.Print IUPAC, IUPACToSmiles
End Function

Public Function URLEncode(StringVal As String, Optional SpaceAsPlus As Boolean = False) As String

  Dim StringLen As Long: StringLen = Len(StringVal)

  If StringLen > 0 Then
    ReDim result(StringLen) As String
    Dim i As Long, CharCode As Integer
    Dim Char As String, Space As String

    If SpaceAsPlus Then Space = "+" Else Space = "%20"

    For i = 1 To StringLen
      Char = Mid$(StringVal, i, 1)
      CharCode = Asc(Char)
      Select Case CharCode
        Case 97 To 122, 65 To 90, 48 To 57, 45, 46, 95, 126
          result(i) = Char
        Case 32
          result(i) = Space
        Case 0 To 15
          result(i) = "%0" & Hex(CharCode)
        Case Else
          result(i) = "%" & Hex(CharCode)
      End Select
    Next i
    URLEncode = Join(result, "")
  End If
End Function

API reference: https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest

To run this (if you instead chose the JSON-formatted output option) you would need to import the JsonConverter.bas file from the project here: https://github.com/VBA-tools/VBA-JSON and add a VBA project reference to the Microsoft scripting runtime

Tim Williams
  • 154,628
  • 8
  • 97
  • 125
  • Thanks for this. I tried to run it but run-time error 424 came up for line: Set json = JsonConverter.ParseJson(objhttp.responsetext) – Daniel May 05 '23 at 18:06
  • 1
    Sorry forgot to note you need to import the **JsonConverter.bas** file from the project here: https://github.com/VBA-tools/VBA-JSON and add a VBA project reference to the **Microsoft scripting runtime** library – Tim Williams May 05 '23 at 18:32