6

User inputs a formula, for example: C12H2COOH

We have to calculate its molecular weight given that C = 12.01, H = 1.008 and O = 16. We were told to be careful of elements with double digits after it and elements with no numbers after it. The program also keeps asking for a chemical formula and exits when you press enter.

I've tried using dictionaries, for loops and while loops. I've gotten to calculate compounds with single digits after the elements like C2H2 but if I put double digits or put no numbers next to the element, it fails. I was also looking at how to separate strings without deleting delimiters as a possible route? How would you guys approach this problem? Any help would be appreciated, thank you!

Here is what I have so far. It's very messy.

xxx = ["H", "C", "O"]
elements = set(xxx)
while(True):
    chemical_formula = input("Enter chemical formula, or enter to quit: ")
    if chemical_formula == "":
        break
    else:
        characters = list(chemical_formula)
        n = 0
        print(characters)
        for i in characters:
            if characters[n] == "C":
                c = 12.0107
                if elements.intersection(set(characters[n+1])):
                    print(c)
                else:
                    number = int(characters[n+1])
                    print(number*c)

            elif characters[n] == "H":
                h = 1.00794
                if elements.intersection(set(characters[n+1])):
                    print(h)
                else:
                    number = int(characters[n+1])
                    print(number*h)

            elif characters[n] == "O":
                o = 15.9994
                if elements.intersection(set(characters[n+1])):
                    print(c)
                else:
                    number = int(characters[n+1])
                    print(number*o) 
            else:
                numero = int(i)
                print(i*0)

            n = n+1
Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
Francis Bautista
  • 71
  • 1
  • 1
  • 5

7 Answers7

6

First:

pip install molmass

Then:

from molmass import Formula

Formula('H2O').isotope.mass
>> 18.01056468403   #  monoisotopic mass

Formula('H2O').mass  
>> 18.015287        # molecular mass
Soerendip
  • 7,684
  • 15
  • 61
  • 128
5

First thing I'd do is replace each occurrence of a letter in the input string by the same letter preceded by a '+', so

C12H2COOH => +C12+H2+C+O+O+H

next, I'd replace each occurrence of a letter followed by a digit by the same letter followed by a '*' and then the digit

+C12+H2+C+O+O+H => +C*12+H*2+C+O+O+H

and then I'd replace each occurrence of a letter by the molecular weight of the element it represents

+C*12+H*2+C+O+O+H => +12.0107*12+1.00794*2+12.0107+15.9994+15.9994+1.00794

Finally I'd evaluate that expression. I can think of 2 or 3 ways to perform these modifications and since it's your homework I'll leave you to choose how to implement this approach if it appeals to you. But do note, string manipulation by regular expressions followed by the evil of eval is not the only implementation option.

Then I'd start working on how to cope with elements whose abbreviations are longer than one letter.

High Performance Mark
  • 77,191
  • 7
  • 105
  • 161
  • Oh right, you're not actually talking about eval... I guess I judged too quick because you used the word "evaluate" in this context ^^ But I still don't know if a state machine parsing the input wouldn't be easier (and potentially less ambiguous for e.g. elements represented by multiple letters). – l4mpi May 22 '13 at 19:11
  • So write an answer suggesting OP uses a state machine and giving some hints about how to do that. I'd upvote that. – High Performance Mark May 22 '13 at 19:17
  • best answer IMO. Learning thought process that goes into this kinda junk is exactly what newbs need. – TehTris May 22 '13 at 20:55
5

EDIT: Updated GitHub Gist

I was doing my Grade 12 chemistry course over the summer and I thought of doing this as well. I thought of a different method to do it, here is version 1 ('ZERO' is just a place Holder I just didn't test with '') I checked C12H2COOH and it gives out the right answer (191.16 g/mol). Hopefully, this helps someone:

__version__ = '1.2.1'
"""
=================================
Molar Mass Calculator
Author: Elijah Lopez
Version: 1.2.1
Last Updated: April 4th 2020
Created: July 8th 2017
Python Version: 3.6+
=================================
"""
MM_of_Elements = {'H': 1.00794, 'He': 4.002602, 'Li': 6.941, 'Be': 9.012182, 'B': 10.811, 'C': 12.0107, 'N': 14.0067,
                  'O': 15.9994, 'F': 18.9984032, 'Ne': 20.1797, 'Na': 22.98976928, 'Mg': 24.305, 'Al': 26.9815386,
                  'Si': 28.0855, 'P': 30.973762, 'S': 32.065, 'Cl': 35.453, 'Ar': 39.948, 'K': 39.0983, 'Ca': 40.078,
                  'Sc': 44.955912, 'Ti': 47.867, 'V': 50.9415, 'Cr': 51.9961, 'Mn': 54.938045,
                  'Fe': 55.845, 'Co': 58.933195, 'Ni': 58.6934, 'Cu': 63.546, 'Zn': 65.409, 'Ga': 69.723, 'Ge': 72.64,
                  'As': 74.9216, 'Se': 78.96, 'Br': 79.904, 'Kr': 83.798, 'Rb': 85.4678, 'Sr': 87.62, 'Y': 88.90585,
                  'Zr': 91.224, 'Nb': 92.90638, 'Mo': 95.94, 'Tc': 98.9063, 'Ru': 101.07, 'Rh': 102.9055, 'Pd': 106.42,
                  'Ag': 107.8682, 'Cd': 112.411, 'In': 114.818, 'Sn': 118.71, 'Sb': 121.760, 'Te': 127.6,
                  'I': 126.90447, 'Xe': 131.293, 'Cs': 132.9054519, 'Ba': 137.327, 'La': 138.90547, 'Ce': 140.116,
                  'Pr': 140.90465, 'Nd': 144.242, 'Pm': 146.9151, 'Sm': 150.36, 'Eu': 151.964, 'Gd': 157.25,
                  'Tb': 158.92535, 'Dy': 162.5, 'Ho': 164.93032, 'Er': 167.259, 'Tm': 168.93421, 'Yb': 173.04,
                  'Lu': 174.967, 'Hf': 178.49, 'Ta': 180.9479, 'W': 183.84, 'Re': 186.207, 'Os': 190.23, 'Ir': 192.217,
                  'Pt': 195.084, 'Au': 196.966569, 'Hg': 200.59, 'Tl': 204.3833, 'Pb': 207.2, 'Bi': 208.9804,
                  'Po': 208.9824, 'At': 209.9871, 'Rn': 222.0176, 'Fr': 223.0197, 'Ra': 226.0254, 'Ac': 227.0278,
                  'Th': 232.03806, 'Pa': 231.03588, 'U': 238.02891, 'Np': 237.0482, 'Pu': 244.0642, 'Am': 243.0614,
                  'Cm': 247.0703, 'Bk': 247.0703, 'Cf': 251.0796, 'Es': 252.0829, 'Fm': 257.0951, 'Md': 258.0951,
                  'No': 259.1009, 'Lr': 262, 'Rf': 267, 'Db': 268, 'Sg': 271, 'Bh': 270, 'Hs': 269, 'Mt': 278,
                  'Ds': 281, 'Rg': 281, 'Cn': 285, 'Nh': 284, 'Fl': 289, 'Mc': 289, 'Lv': 292, 'Ts': 294, 'Og': 294,
                  '': 0}


def molar_mass(compound: str, decimal_places=None) -> float:
    is_polyatomic = end = multiply = False
    polyatomic_mass, m_m, multiplier = 0, 0, 1
    element = ''

    for e in compound:
        if is_polyatomic:
            if end:
                is_polyatomic = False
                m_m += int(e) * polyatomic_mass if e.isdigit() else polyatomic_mass + MM_of_Elements[e]
            elif e.isdigit():
                multiplier = int(str(multiplier) + e) if multiply else int(e)
                multiply = True
            elif e.islower():
                element += e
            elif e.isupper():
                polyatomic_mass += multiplier * MM_of_Elements[element]
                element, multiplier, multiply = e, 1, False
            elif e == ')':
                polyatomic_mass += multiplier * MM_of_Elements[element]
                element, multiplier = '', 1
                end, multiply = True, False
        elif e == '(':
            m_m += multiplier * MM_of_Elements[element]
            element, multiplier = '', 1
            is_polyatomic, multiply = True, False
        elif e.isdigit():
            multiplier = int(str(multiplier) + e) if multiply else int(e)
            multiply = True
        elif e.islower():
            element += e
        elif e.isupper():
            m_m += multiplier * MM_of_Elements[element]
            element, multiplier, multiply = e, 1, False
    m_m += multiplier * MM_of_Elements[element]
    if decimal_places is not None:
        return round(m_m, decimal_places)
    return m_m
Elijah
  • 1,814
  • 21
  • 27
2

Your code is a mess, e.g. you unneccessarily transform the input string into a list, then iterate over it but still use a numerical index to access the characters. Also it won't be of much use looking at each character individually on the fly because this obviously breaks on numbers with more than one digit. Also, you output the weight of each encountered element individually - shouldn't you output the sum?

The following code uses a small state machine to parse the input string and output the combined weights. It assumes that every formula starts with an element, that all encountered elements are contained in the weights dictionary and that no element name is longer than a single character:

#use a dictionary to map elements to their weights
weights = {"H": 1.00794, "C": 12.0107, "O": 15.9994}

def getInt(clist):
    """helper for parsing a list of chars as an int (returns 1 for empty list)"""
    if not clist: return 1
    return int(''.join(clist))

def getWeight(formula):
    """ get the combined weight of the formula in the input string """
    formula = list(formula)
    #initialize the weight to zero, and a list as a buffer for numbers
    weight = 0
    num_buffer = []
    #get the first element weight
    el_weight = weights[formula.pop(0)]
    while formula:
        next = formula.pop(0)
        if next in weights:
            #next character is an element, add current element weight to total
            weight += el_weight * getInt(num_buffer)
            #get the new elements weight
            el_weight = weights[element]
            #clear the number buffer
            num_buffer = []
        else:
            #next character is not an element -> it is a number, append to buffer
            num_buffer.append(next)
    #add the last element's weight and return the value
    return weight + el_weight * getInt(num_buffer)

while 1:
    #main loop
    chemical_formula = input("Enter chemical formula, or enter to quit: ")
    if not chemical_formula:
        break
    print("Combined weight is %s" % getWeight(chemical_formula))

This can be easily extended to deal with multi-character elements by changing the conditions in the while loop in getWeight to append a character to the int buffer if it is a digit, and else append it to a string containing the current element name; then fetching the weight and resetting the name to '' if the name is contained in the weights dictionary.

l4mpi
  • 5,103
  • 3
  • 34
  • 54
1

Here you are a molecular weight python script using regular expressions to parse formula.

Included some debug code

import re

#some element data

elements ={}
elements["H"] = 1
elements["C"] = 12
elements["O"] = 16
elements["Cl"] = 35.45


#DDT (1,1,1-trichloro-2,2-di(4-chlorophenyl)ethane)
formula = "(ClC6H4)2CH(CCl3))"

sFormula = formula

print("Original Formula: ", sFormula)

#Search data inside ()

myRegEx = re.compile(r"(\()(\w*)(\))(\d*)",re.I)

myMatches = myRegEx.findall(sFormula)

while myMatches:
    myMatches = myRegEx.findall(sFormula)
    for match in myMatches:
        print (match[1], match[3])
        count = match[3]
        text =""
        if (count == ""):
            count = 1
        else:
            count = int(match[3])
        while (count >= 1):
            text = text + match[1]
            count -= 1
            print(text)
        sFormula = sFormula.replace('(' + match[1] + ')' + match[3], text)
        print("Replaced formula: ",sFormula)

myRegEx = re.compile("(C[laroudsemf]?|Os?|N[eaibdpos]?|S[icernbmg]?|P[drmtboau]?|H[eofgas]?|A[lrsgutcm]|B[eraik]?|Dy|E[urs]|F[erm]?|G[aed]|I[nr]?|Kr?|L[iaur]|M[gnodt]|R[buhenaf]|T[icebmalh]|U|V|W|Xe|Yb?|Z[nr])(\d*)")

myMatches = myRegEx.findall(sFormula)

molecularFormula =""
MW = 0
text =""

for match in myMatches:
    #Search symbol
    symbol = match[0]
    #Search numbers
    number = match[1]
    print(symbol,number)
    if (number == ""):
        number = 1
    else:
        number = int(match[1])
    MW = MW + float(elements[symbol])*number
    while (number >=1):
        molecularFormula = molecularFormula + symbol
        number -= 1 
print(molecularFormula)
print("formula: " + formula + " MW = " + str(MW))
0

I had similar requirement and created pure python code for same. It supports brackets in any combination as well as upto 2 digits for number of Elements.

git clone https://github.com/stardustcafe/molecularstats

Example Code

from molstats.molstats import Molecule
f1=Molecule('CH3CH4')
f1.getMolecularWeight()
31.07698
f1.getNumElements()
9
0

I had a homework question similar to this. Instead of using an executable input, however, my teacher asked for us to build a function stating: "Your function call should read: mw('C6H8OOH2O'). The output should read: Molecular weight of C6H8OOH2O is 130.14" After whoring over this same problem for 2 days now, I finally have a simple solution that works just perfectly for newbs to python (such as myself). The nice thing is, if you need to expand your library of included elements to tackle a different formula, all you have to do is find their atomic weight, note it, and add a new "if" statement to the "for i in range" loop accounting for whatever your other elements are.

So here you go! I hope this helps somebody as much as I wish it could've helped me! Enjoy!

from numpy import *
#C: 12.011
#H: 1.008
#O: 15.999

def mw(formula):
    expanded = ''
    #Takes formula input and converts it into solely letters (ex - C4H2=CCCCHH)
    for character in formula:
        if character.isdigit():
            expanded += expanded[-1] * (int(character) - 1)
        else:
            expanded += character
    #Converts new string of letters into a list
    lexp = list(expanded)
    #Identifies C, H, and O and assigns them their atomic masses as int values
    for i in range(len(lexp)):
        if lexp[i] == 'C':
            lexp[i] = 12.011
        if lexp[i] == 'H':
            lexp[i] = 1.008
        if lexp[i] == 'O':
            lexp[i] = 15.999
    #Adds up all values of the list we just turned into integers
    sumall = sum(lexp)
    return print('Molecular weight of',formula,'is',sumall.round(2))

mw('C6H8OOH2O')

Output: Molecular weight of C6H8OOH2O is 130.14