2

I'm trying to parse strings in Python, looking for scientific values and units. I want to retrieve them in order to convert them to some other units.

I'm using the library unit-parse (based on pint) but it has trouble understanding this example : 12.5g/100ml.

I managed a workaround : replacing g/100mL in the string by another word (stuff for example in the code below) and using this word as a new unit (equivalent to (g/l) * 10)

My code:

import logging

import pint

u = pint.UnitRegistry()
U = Unit = u.Unit
Q = Quantity = u.Quantity

from unit_parse import parser, logger, config


def display(text):
    text = text.replace(" ", "")  # Suppress spaces.
    result = parser(text)

    print(f"RESULT = {result}")
    print(f"VALUE = {result.m}")
    print(f"UNIT = {result.u}")

    print(f"to g/l = {result.to('g/L')}")
    print(f"to g/ml = {result.to('g/ml')}")
    print(f"to stuff = {result.to('stuff')}")


def main():
    u.define('stuff = (g/l) * 10')

    logger.setLevel(logging.INFO)

    more_last_minute_sub = [["g/100mL", "stuff"]]  # [bad text/regex, new text]
    config.last_minute_sub += more_last_minute_sub  # Here we are adding to the existing list of units

    text = ("12.5g / 100mL")

Is there a better way to do this ? Or should I stick to this workaround ? Is there a better library to use ?

cuzureau
  • 330
  • 2
  • 17

2 Answers2

3

Isn't plain old UnitRegistry.parse_string() enough?

>>> pattern = '{gram}g/{milliliter}ml'
>>> input_str = '12.5g/100ml'
>>> mass, vol = ureg.parse_pattern(input_str, pattern)
>>> print((mass / vol).to('g / l'))
125.00000000000001 gram / liter
Nikolaj Š.
  • 1,457
  • 1
  • 10
  • 17
  • Great solution. What should I do if I'm searching for multiple patterns ? Should I create a list to loop and apply `ureg.parse_pattern()` to all of them to see if there is a match ? – cuzureau Jan 03 '23 at 09:24
  • @cuzureau, I have another idea, using _unit-parse_, but I won't be at my machine for a couple of days. I'll post another answer, if nobody has the same idea before that. – Nikolaj Š. Jan 03 '23 at 09:31
  • Can you at least give me some details about your solution ^^ ? I don't have to necessarily use unit-parse. I want to use the most efficient and straightforward way to do it. – cuzureau Jan 03 '23 at 09:41
  • @cuzureau, try using `more_last_minute_sub` to add parentheses like '12.5 g/ (100ml)'. It should be easy to do with regexes, and that string seems to be parsed correctly by _unit-parse_ – Nikolaj Š. Jan 03 '23 at 09:48
  • I could not make it work. BUT you gave me this idea : I used `more_last_minute_sub` not to add parentheses but to change `g/100mL` to `g/dL` which is equivalent ! I forgot my science for a moment ^^. This way the parser understand it perfectly. – cuzureau Jan 03 '23 at 09:56
1

You can use config.pre_proc_sub to turn "12.5g / 100mL" (before being processed) to something like "12.5g / (100mL)", which is parsed correctly, and not as "(12.5g / 100) * mL"

>>> more_pre_processing = [(r'/([0-9.]+\s*\S+)', r'/(\1)')]
>>> unit_parse.config.pre_proc_sub += more_pre_processing
>>> input_str = '12.5g/100ml'
>>> print(unit_parse.parser(input_str).to('g / l'))
125.00000000000001 g / l
Nikolaj Š.
  • 1,457
  • 1
  • 10
  • 17