First, you want to strip out all numbers from the list of elements. You can do this using a regular expression. The regular expression I use here captures all characters between A-Z
and a-z
, i.e. all letters. Try it online
import re
elements = []
for item in lst:
elem = re.match(r"([A-Za-z]+)", item).group(0)
elements.append(elem)
(or you can write the loop as a list comprehension)
elements = [re.match(r"([A-Za-z]+)", item).group(0) for item in lst]
which gives:
elements = ['A', 'A', 'B', 'B', 'B', 'B']
Next, you want to count how many of each element are in the list. You can do this using a collections.Counter
import collections
element_counts = collections.Counter(elements)
which gives:
element_counts = Counter({'A': 2, 'B': 4})
Note: you can combine this step with the previous step. This way, you avoid creating the elements
list and you only need one iteration over all items in your original list instead of two (one to create elements
, one to count them all):
element_counts = collections.Counter(re.match(r"([A-Za-z]+)", item).group(0) for item in lst)
Now, you need to figure out the greatest common factor of all the values in the counter. What a happy surprise, it's a part of the standard library! Also, since GCD is associative, we can find the GCD of more than two numbers using functools.reduce
. (In python 3.9+, math.gcd
already takes care of this)
import functools
gcd = functools.reduce(lambda x,y: math.gcd(x, y), element_counts.values())
# Or for Py3.9+
gcd = math.gcd(*element_counts.values())
For our element_counts
, we get gcd = 2
Finally, divide the count of each element by the GCD, and join it into a single string:
compound_string = "".join([f"{elem}{count//gcd}" for elem, count in element_counts.items()])
which gives compound_string = 'A1B2
. Oops! Elements with a single atom don't need a number. Let's handle that by writing a function that will handle the formatting instead of a list comprehension with an f-string:
def elem_to_str(elem, count):
if count == 1: return elem
else: return f"{elem}{count}"
compound_string = "".join(elem_to_str(elem, count//gcd) for elem, count in element_counts.items())
Finally, we have our desired output: compound_string = 'AB2'