10

All I need is to check, using python, if a string is a valid math expression or not.

For simplicity let's say I just need + - * / operators (+ - as unary too) with numbers and nested parenthesis. I add also simple variable names for completeness.

So I can test this way:

test("-3 * (2 + 1)") #valid
test("-3 * ")        #NOT valid

test("v1 + v2")      #valid
test("v2 - 2v")      #NOT valid ("2v" not a valid variable name)

I tried pyparsing but just trying the example: "simple algebraic expression parser, that performs +,-,*,/ and ^ arithmetic operations" I get passed invalid code and also trying to fix it I always get wrong syntaxes being parsed without raising Exceptions

just try:

>>>test('9', 9)
9 qwerty = 9.0 ['9'] => ['9']
>>>test('9 qwerty', 9)
9 qwerty = 9.0 ['9'] => ['9']

both test pass... o_O

Any advice?

neurino
  • 11,500
  • 2
  • 40
  • 63

5 Answers5

3

This is because the pyparsing code allows functions. (And by the way, it does a lot more than what you need, i.e. create a stack and evaluate that.)

For starters, you could remove pi and ident (and possibly something else I'm missing right now) from the code to disallow characters.

The reason is different: PyParsing parsers won't try to consume the whole input by default. You have to add + StringEnd() (and import it, of course) to the end of expr to make it fail if it can't parse the whole input. In that case, pyparsing.ParseException will be raised. (Source: http://pyparsing-public.wikispaces.com/FAQs)

If you care to learn a bit of parsing, what you need can propably be built in less than thirty lines with any decent parsing library (I like LEPL).

  • not true since `pi`... is `pi` and not `querty` and ident comes only followed by parenthesis... Of course if I could get pyparsing to work as a valid syntax checker I'd like it. I'll give LEPL a chance too. – neurino Feb 03 '11 at 15:17
  • @neuriono: Then either the source code is misleading and the grammar is actually different, or pyparsing is broken (edit: one explanation I can think of, which would be in the category "pyparsing is broken": It doesn't consume the whole string but rather exits and returns what it parsed so far if the remaining input fails to parse). –  Feb 03 '11 at 15:20
  • well this is quite obvious, but if you look at the part of code that builds the parser (def BNF()) is quite simple and even removing things like _exponentiation_ part making it even simpler it still fails so I guess pyparsing is not good in checking syntax. – neurino Feb 03 '11 at 15:29
  • @neuriono: My guess was right. Added cause and fix to the answer. –  Feb 03 '11 at 15:29
  • 1
    or add parseAll=True... Thanks for pointing this out, I'll see if I can really get it to check my syntax and give you the best answer – neurino Feb 03 '11 at 15:45
1

Why not just evaluate it and catch the syntax error?

from math import *

def validateSyntax(expression):
  functions = {'__builtins__': None}
  variables = {'__builtins__': None}

  functions = {'acos': acos,
               'asin': asin,
               'atan': atan,
               'atan2': atan2,
               'ceil': ceil,
               'cos': cos,
               'cosh': cosh,
               'degrees': degrees,
               'exp': exp,
               'fabs':fabs,
               'floor': floor,
               'fmod': fmod,
               'frexp': frexp,
               'hypot': hypot,
               'ldexp': ldexp,
               'log': log,
               'log10': log10,
               'modf': modf,
               'pow': pow,
               'radians': radians,
               'sin': sin,
               'sinh': sinh,
               'sqrt': sqrt,
               'tan': tan,
               'tanh': tanh}

  variables = {'e': e, 'pi': pi}

  try:
    eval(expression, variables, functions)
  except (SyntaxError, NameError, ZeroDivisionError):
    return False
  else:
    return True

Here are some samples:

> print validSyntax('a+b-1') # a, b are undefined, so a NameError arises.
> False

> print validSyntax('1 + 2')
> True

> print validSyntax('1 - 2')
> True

> print validSyntax('1 / 2')
> True

> print validSyntax('1 * 2')
> True

> print validSyntax('1 +/ 2')
> False

> print validSyntax('1 + (2')
> False

> print validSyntax('import os')
> False

> print validSyntax('print "asd"')
> False

> print validSyntax('import os; os.delete("~\test.txt")')
> False # And the file was not removed

It's restricted to only mathematical operations, so it should work a bit better than a crude eval.

Blender
  • 289,723
  • 53
  • 439
  • 496
  • 2
    This is much worse than the first (now deleted) answer, which at least checked if the answer consists of only numbers and operators. Yours allows abritary code :( –  Feb 03 '11 at 15:21
  • `literal_eval` is not the answer, as you want to allow math operators and parens. –  Feb 03 '11 at 15:30
  • One more update: I've changed the source of `literal_eval` so that it only accepts binary and unary operations (hopefully it's clean now). – Blender Feb 03 '11 at 15:34
  • So much work and code... ever thought about just going that other guy's way (checking for numbers+ops) or doing it properly and building a parser? –  Feb 03 '11 at 15:39
  • Your first sample is still wrong for my needs... the expression is valid syntax, if variables are not defined that's a NameError, not a SyntaxError... – neurino Feb 03 '11 at 15:42
  • Seven lines and -20 from the source? That's *nothing*! And who knows, maybe my solution will work better in the long run if you are planning on adding more complex mathematical syntax checking. – Blender Feb 03 '11 at 15:42
  • I'm working on it, sheesh. You're acting like I'm doing a job wrong. Be happy that you're even getting help. – Blender Feb 03 '11 at 15:47
  • 1
    @Blender: Even if requirements expand, a solution using a parsing library will be adjusted easily. No need to hand-roll the solution. –  Feb 03 '11 at 15:51
  • @Blender: I'm glad for everybody help but I give it as an assumption that giving syntax to an _evaluator_ is the wrong and possibly unsecure way. Thanks for your efforts anyway. – neurino Feb 03 '11 at 15:55
  • Okay, *this* update should work. Correct me if I'm wrong, but I think it's pretty safe to use an `eval()` in this case. – Blender Feb 03 '11 at 16:06
  • Perhaps safe, but still way too much code. Even debatable if we were going to evaluate the expression, but absolutely overkill (and also still dirty) if the problem is syntax checking. –  Feb 03 '11 at 16:11
  • I can format it a bit better, but why is it dirty? Find me a hole and I'll be happy. – Blender Feb 03 '11 at 16:14
  • You know trying eval("1000**1000**1000") will hang your python for long? I hope this code not running on a webserver... – neurino Feb 03 '11 at 16:15
  • I see what you mean... Let me see what I can do. – Blender Feb 03 '11 at 16:20
  • I think this would be vulnerable to race conditions if multithreaded. – user470379 Feb 03 '11 at 23:18
1

You could try building a simple parser yourself to tokenize the string of the arithmetic expression and then build an expression tree, if the tree is valid (the leaves are all operands and the internal nodes are all operators) then you can say that the expression is valid.

The basic concept is to make a few helper functions to create your parser.

def extract() will get the next character from the expression
def peek() similar to extract but used if there is no whitespace to check the next character
get_expression()
get_next_token()

Alternatively if you can guarantee whitespace between characters you could use split() to do all the tokenizing.

Then you build your tree and evaluate if its structured correctly

Try this for more info: http://effbot.org/zone/simple-top-down-parsing.htm

Jordan
  • 4,928
  • 4
  • 26
  • 39
  • Building all this yourself is sooo last century... these days, you use a parsing library which takes care of all the nasty bureaucracy. –  Feb 03 '11 at 15:25
  • @delnan I added the fact that if there is whitespace you can just use split(), also if there is no such library out there that meets your needs (functional but not too big etc...), what then? – Jordan Feb 03 '11 at 15:29
  • @Yoel: Then you're out of luck and propably have too high standards. –  Feb 03 '11 at 15:32
  • @delnan I don't understand someone had to write the library LEPL that you said you liked to use. I guess call me old fashioned for wanting to do things that are "sooo last century" ;) – Jordan Feb 03 '11 at 15:36
  • 1
    @Yoel: I assume you already parsed something nontrivial (CSV is at the borderline between trivial and simple) by hand? I once was about to, but halfway through I realized that I was writing helper functions, utilities, etc. that parsing libraries already provide (not to mention that my code was still buggy while theirs worked flawlessly). –  Feb 03 '11 at 15:41
  • @delnan: +1 for not letting this chain of comments get aggressive. I understand where your coming from and yeah using libraries that are tried and true is very useful. I just felt for this specific application that @neurino needed, it might make sense to go the custom route. I guess I was jumping to conclusions about what restraints there was on space and dependencies – Jordan Feb 03 '11 at 15:47
  • @Yoel: Well, since OP presented a pyparsing solution he wanted to get working, dependencies seem fine. But nevermind. –  Feb 03 '11 at 15:49
1

Adding parseAll=True to the call to parseString will convert this parser into a validator.

PaulMcG
  • 62,419
  • 16
  • 94
  • 130
0

If you are interested in modifying a custom math evaluator engine written in Python so that it is a validator instead, you could start out with Evaluator 2.0 (Python 3.x) and Math_Evaluator (Python 2.x). They are not ready-made solutions but would allow you to fully customize whatever it is you are trying to do exactly using (hopefully) easy-to-read Python code. Note that "and" & "or" are treated as operators.

Noctis Skytower
  • 21,433
  • 16
  • 79
  • 117