0

I'm writing a parser than converts LaTeX math in to python eval() compatible string.

I get to a point where I have a string that looks like this:

\sqrt{4m/s} - \frac{3v+10.5v}{20a-8a} +1/2

Notice the still mostly LaTeX syntax, as well as some arbitrary "unit" letters thrown in. I then use the following Negated set, to replace everything except what is in the negated set.

    mathstr = re.sub('[^0-9*()/+\-.Q]','',mathstr)

How Can I include a substring "sqrt" so that it can work in a similar fashion, and preferably in the same regular expression?

Right now my work around is replacing '\sqrt' with 'Q', doing the line of code above, and then setting 'Q' to 'sqrt' after, my full routine for going from the above syntax to eval() syntax is as follows:

    mathstr = mathstr.replace(" ","")
    if pwrRe.search(mathstr):
        mathstr = re.sub(pwrRe,'**',mathstr)
    if MultiplyRe.search(mathstr):
        mathstr = re.sub(MultiplyRe,'*',mathstr)
    if DivideRe.search(mathstr) or sqrtRe.search(mathstr):
        mathstr = re.sub('\\\\frac{','(',mathstr)
        mathstr = re.sub('\\\\sqrt{','\\\\sqrt(',mathstr)   
        mathstr = re.sub('}{',')/(',mathstr)
        mathstr = re.sub('}',')',mathstr)
    mathstr = re.sub('[/*+\-^][a-zA-Z]','',mathstr)
    mathstr = re.sub('\\\\sqrt','Q',mathstr)
    mathstr = re.sub('[^0-9*()/+\-.Q]','',mathstr)
    mathstr = re.sub(r'Q','sqrt',mathstr)

Which results in the eval() syntax'd:

sqrt(4)-(3+10.5)/(20-8)+1/2

But this is sloppy, and it would be useful in many areas if I could 'whitelist' characters and substrings in one line, blowing away all other characters that come up.

EDIT:

As I continue expanding my script this list will get longer but for now I want to match the following and discard everything else:

0123456789()/*+-^sqrt <-- only sqrt when it's a substring

Here are a few examples:

Before: sqrt(5s+2s)+(3s**2/9s)
After: sqrt(5+2)+(3**2/9)

Before: sqrt(4*(5+2)/(2))\$
After:  sqrt(4*(5+2)/(2))

Before: sqrt(4v/a)-(3v+10.5v)/(20a-8a)+1/2ohms
After:  sqrt(4)-(3+10.5)/(20-8)+1/2

There is some nuance to this beyond simply matching only those characters as well. In my first example you can see I have v/a, even though there is an '/' there, I remove that as well.

RAGHHURAAMM
  • 1,099
  • 7
  • 15
codeNoob
  • 15
  • 3
  • Can you give some additional example lines, and what you want to match for each? It's not super clear what you're trying to do. – jedwards Oct 10 '18 at 22:51
  • @jedwards I added a section at the bottom with examples. I think my key issue here though is mixing character set matches with substring matches in the same line. – codeNoob Oct 11 '18 at 01:22

1 Answers1

1

Instead of "deleting" characters that aren't specified, what about "keeping" characters that are specified -- this is easy enough since you've already negated the group:

[0-9*()/+\-.Q]

Then you can add any alternative literals you want, e.g.:

[0-9*()/+\-.Q]|sqrt

In Python, this might look like, using join and re.findall():

tests = [
    ('sqrt(5s+2s)+(3s**2/9s)', 'sqrt(5+2)+(3**2/9)'),
    ('sqrt(4*(5+2)/(2))\$', 'sqrt(4*(5+2)/(2))'),
    ('sqrt(4v/a)-(3v+10.5v)/(20a-8a)+1/2ohms)', 'sqrt(4)-(3+10.5)/(20-8)+1/2')
]

import re

for (before, expected) in tests:
    matches = re.findall(r"[0-9*()/+\-.Q]|sqrt", before)
    after = ''.join(matches)

    is_ok = (after == expected)
    print(after, is_ok, '' if is_ok else expected)

Output:

sqrt(5+2)+(3**2/9)               True 
sqrt(4*(5+2)/(2))                True 
sqrt(4/)-(3+10.5)/(20-8)+1/2)    False    sqrt(4)-(3+10.5)/(20-8)+1/2

(the last one doesn't match what you're expecting because of the first forward slash, but that's outside the scope of the question really.)

jedwards
  • 29,432
  • 3
  • 65
  • 92
  • I should've looked in to the rest of the re.* functions. Also need to read up on what the r does. Thanks a bunch. Additionally the third case passes with mathstr = re.sub('[/*+\-^][a-zA-Z]','',mathstr), I *THINK* that would cover any cases where I have units like v/a or m/s – codeNoob Oct 11 '18 at 02:39
  • @codeNoob `r` specifies a ["raw" string literal](https://stackoverflow.com/questions/2081640/what-exactly-do-u-and-r-string-flags-do-and-what-are-raw-string-literals). – jedwards Oct 11 '18 at 02:45