2

I would like to create a dictionary to split all elements of a molecular formula. I tried using re module

Formula="C16H21NO2Na3"

pat = re.compile('(?P<name>[A-Z][a-z]+)\[0-9]+(?P<name2>[0-9]+)')
molecule= pat.findall(Formula)
print (molecule)

I expected this return:

{'C': 16, 'H': 21, 'N': '', 'O': 2, 'Na': 3}
Van3
  • 49
  • 6

1 Answers1

5

you were pretty close

pat = re.compile('(?P<name>[A-Z][a-z]?)(?P<value>[0-9]*)')

name is an uppercase letter followed by zero or 1 lowercase letters and value is 0 or more digits

then to make it a dict you just call dict on it

matches = pat.findall(Formula)
data = dict(matches)
# {'C': '16', 'H': '21', 'N': '', 'O': '2', 'Na': '3'}

you could be more sophisticated with the dict as follows

data = {k: int(v) if v else 1 for k,v in matches}
# {'C': 16, 'H': 21, 'N': 1, 'O': 2, 'Na': 3}

# the following will also work, which is slightly shorter (thanks @copperfield)
data = {k: int(v or 1) for k,v in matches}
# {'C': 16, 'H': 21, 'N': 1, 'O': 2, 'Na': 3}
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • 1
    you can also use `or` instead of the ternary operator, like `int( v or 1)` – Copperfield Nov 24 '21 at 01:20
  • Many thanks @Joran Beasley & Copperfield it works very well ! So in the initial code it was `\[0-9]+` that make the mess. – Van3 Nov 24 '21 at 12:56