11

I'd like to split a string like 3cm/µs² + 4e-4 sqmiles/km/h**2 into its SI unit (in this case, m/s**2) and its magnitude (in multiples of that unit).

Since sympy provides both a parsing module and many physical units and SI prefixes, I guess using sympy would be a good idea. But what is a nice way to achieve this? I'd write an algorithm like the following, but I'd like to avoid reinventing a squared wheel:

  • Treat the transition between a number and a letter (except for the 4e-4 like syntax) and whitespace (unless its next to an explicit operator) as multiplication, then tokenize
  • Replace each non-numeric token by its SI representation (also checking for SI-prefixes)
  • Simplify the new expression down to Magnitude * some SI units (giving a meaningful error message on inconsistent units, e.g. Cannot add m**2 to s)

Can this be easily achieved via existing means? Or how would this be best implemented?

Tobias Kienzler
  • 25,759
  • 22
  • 127
  • 221
  • Are you allowing free text input? If not, one way to short circuit this issue is to create the parsing information as they enter data (eg if this was for an app with a 'cm' button, add an appropriate object then). Otherwise, your approach sounds ok. I'd do units first, then magnitude replacements, then just exec the math. – sapi Apr 09 '13 at 07:30
  • @sapi Yes, it's free text input, e.g. via a configuration text file. Otherwise a text field plus e.g. a pull-down unit menu would simplify this a lot indeed. – Tobias Kienzler Apr 09 '13 at 07:32

2 Answers2

4

Units

A solution would be to gather all units from the SymPy units module and use them to substitute the symbols created by sympify

>>> import sympy.physics.units as u 
... subs = {} 
... for k, v in u.__dict__.items(): 
...     if isinstance(v, Expr) and v.has(u.Unit): 
...         subs[Symbol(k)] = v # Map the `Symbol` for a unit to the unit

>>> # sympify returns `Symbol`s, `subs` maps them to `Unit`s
>>> print sympify('yard*millimeter/ly').subs(subs)
127*m/1313990343414000000000

If the symbol is not in units it will just be printed as unknown symbol (for instance barn)

>>> print sympify('barn/meter**2').subs(subs)
barn/m**2 

But you can always add stuff to the subs dictionary.

>>> subs[Symbol('almost_meter')] = 0.9*u.meter
... sympify('almost_meter').subs(subs)
0.9*m

SI prefixes don't work exactly like you want them. You will need to add a multiplication sign (or hope that it is a common unit like km which is explicitly implemented). Moreover, as they are not Unit instances but rather Integer instance you will have to add them to subs:

>>> import sympy.physics.units as u
... subs = {} 
... for k, v in u.__dict__.items(): 
...     if (isinstance(v, Expr) and v.has(u.Unit)) or isinstance(v, Integer): 
...         subs[Symbol(k)] = v 

>>> print sympify('mega*m').subs(subs)
1000000*m 

For unicode you might need some preprocessing. I do not think SymPy makes any promises about unicode support.

If you implement new Units, please consider making a pull request with them on github. The file to edit should be sympy/physics/units.py.

Whitespaces and implicit multiplication

In the dev version of SymPy you can find code for assuming implicit multiplications where appropriate whitespaces are written:

>>> from sympy.parsing.sympy_parser import (parse_expr,
... standard_transformations, implicit_multiplication_application)

>>> parse_expr("10sin**2 x**2 + 3xyz + tan theta",
...            transformations=(standard_transformations + 
...                             (implicit_multiplication_application,)))
3*x*y*z + 10*sin(x**2)**2 + tan(theta) 

Security

sympify uses eval which is exploitable if you are going to use it for a web facing app!

Krastanov
  • 6,479
  • 3
  • 29
  • 42
  • I guess we ought to document that for `sympify`. – asmeurer Apr 10 '13 at 02:28
  • At what point does `sympify` use `eval` (also @asmeurer) - indirectly via `parse_expr`? One could also argue about [`return a._simpy_()`](https://github.com/sympy/sympy/blob/master/sympy/core/sympify.py#L239), if someone manages to sneak in a class with such a method. – Tobias Kienzler Apr 10 '13 at 06:58
  • `parse_expr` works by adding tokens using an extension of the `tokenize` module, and then evaling the resulting string. – asmeurer Apr 10 '13 at 14:36
  • Actually this does not work (at least not in 2018). Trying to do something like `sympify('m/s').subs(subs)` will fail, because `m` is first substituted by `meter` and then it fails, trying to substitute `meter` by `meter`. An updated answer which works would deserve a +1. I would also recommend to improve this answer by importing all quantities used (e.g. sympify, Symbol, etc.) and replacing `print something` (python 2) with `print(something)` (python 3) so it becomes a valid MWE. – sigvaldm Jul 21 '18 at 14:38
3

I've found astropy to have a good units module. After some preparation you can do

import astropy.units as u
from functools import reduce
u.Unit('MeV/fm').si #160.218 N
eval('1*MeV/fm+3*N',u.__dict__).si #163.21765649999998 N

from astropy.units import imperial
u.__dict__.update(imperial.__dict__)
u.sqmiles = u.mile**2
eval('3*cm/Ys**2 + 4e-4*sqmiles/km/h**2',u.__dict__).si #7.993790464000001e-08 m / s2

The following function adds scipy CODATA constants as quantities to astropy units

def units_and_constants():
    """
    >>> u = units_and_constants()
    >>> u.hartree_joule_relationship
    <Quantity 4.35974434e-18 J>

    >>> eval('1*MeV/fm+3*N',u.__dict__).si
    <Quantity 163.21765649999998 N>

    """
    import astropy.units as u
    from astropy.units import imperial
    u.__dict__.update(imperial.__dict__)
    from scipy.constants import physical_constants, value, unit
    import string
    def qntty(x): 
        un = unit(x)
        va = value(x)
        if un:
            return va*eval(un.strip().replace(' ','*').replace('^','**'),u.__dict__)
        else:
            return va
    u.sr = u.radian**2
    u.E_h = qntty('hartree-joule relationship')
    u.c = qntty('speed of light in vacuum')
    u.C_90 = (1+4.6e-8)*u.C 
    codata = {}
    for n, t in physical_constants.items():
        v = qntty(n)
        for x in string.punctuation+' ':
            n = n.replace(x,'_')
        codata[n] = v
    u.__dict__.update(codata)
    return u

yt also tackles a problem similar to yours. Have a look at the Test file to see how it is used.

Roland Puntaier
  • 3,250
  • 30
  • 35