9

I have 'if statements' from two different sources, which try to implement the same condition possibly in a different way. The 'if statements' are C. If at all possible I need a python script that can decide whether pairs of conditions are equivalent or not. A basic example:

source1: ((op1 != v1) || ((op2 != v2) || (op3 != v3)))

source2: ((op2 != v2) || (op1 != v1) || (op3 != v3))

Of course any operator is allowed, function calls and of course parentheses.

Any ideas are welcome.

Edit 1: The function calls have no side effects.

user1514631
  • 1,183
  • 1
  • 9
  • 14
  • 4
    This is one of those problems which seems easy at first, and then you look at it for a minute and can't help but saying "eeiiwww". – cwallenpoole Mar 06 '13 at 21:54
  • 3
    Write a parser in PLY that turns it into a truth table and compare the truth tables. http://www.dabeaz.com/ply/ – Patashu Mar 06 '13 at 21:54
  • 7
    If you are considering function calls as op1 and op2, , you should note that ((op1 != v1) || ((op2 != v2) || (op3 != v3))) is not necessarily equivalent to ((op2 != v2) || (op1 != v1) || (op3 != v3)). For example suppose op1 != v1 and op2 != v2, in the first statement op1 is only called, in the second statement only op2. This is due to short circuiting. – user23127 Mar 06 '13 at 21:54
  • I think @Patashu is right, you'll need to build an all-out parser for this. Just curious, though, what is the use case? Maybe there is an easier way? – cwallenpoole Mar 06 '13 at 21:55
  • 3
    Unless I'm mistaken, this problem is NP-hard, as it's equivalent to [Boolean satisfiability](http://en.wikipedia.org/wiki/Boolean_satisfiability). (There's a good chance I'm mistaken, though...) So if I'm not mistaken, the computational requirements grow very quickly (i.e. exponentially) with the number of terms. – Oliver Charlesworth Mar 06 '13 at 22:13
  • 6
    If function calls are allowed, then I can think of at least one case that would require solving the halting problem. You're probably going to need to narrow things down a bit. Something like "Functions can't have side effects and always return a result" – Pete Baughman Mar 06 '13 at 22:22
  • @user23127,@PeteBaughman: yes you're right, it is assumed that the function calls don't have any side effects. I'll update the question. – user1514631 Mar 07 '13 at 16:13
  • Every one of you contributed something meaningful until now. Thank you for your contribution. – user1514631 Mar 07 '13 at 16:23
  • @cwallenpoole 1. What is the use case? There are hundreds of pseudo-code-likish specification fragments (not formalized but neither free text) and the corresponding C implementation. Using layers upon layers of transformations and a crude parser I transform the specification to C code. At this stage I already attempt to "reorder" commutative operations but I have to consider existing parenthesis. The same goes for the actual C code and then they are compared (as strings). – user1514631 Mar 07 '13 at 16:25
  • @cwallenpoole 2. This sounds most likely a bit futuristic or barbaric, but it works quite well. The problem remains: there are many situations where the code conforms to the specification but it is written "too" differently, hence a direct match can't be confirmed. – user1514631 Mar 07 '13 at 16:26
  • @user1514631 Even after Edit 1 there's a problem: Consider the Function Forever() which enters an infinite loop: Is (TRUE || Forever()) equivalent to (Forever() || TRUE) ? Alternatively, substitute "Forever()" with "CrashProgram()". The latter example may be considered a side-effect, but certainly not the first. – Pete Baughman Mar 07 '13 at 18:09
  • @PeteBaughman, you're of course right, but I don't need or want to evaluate any of the expressions. There is a somewhat short description two posts above, where I try to answer the question: What is the use case? Lacking a better word, I would like to "transform" (not evaluate) source1 and source2 until I can say, that for the exact same inputs (any combination thereof) both expressions would evaluate to the same result (or not). _build the entire truth table!_ as suggested by Hooked one post below is not practicable, because any 'if condition' may have from 1 to about 30 sub-expressions. – user1514631 Mar 07 '13 at 19:16
  • @user1514631 So would you consider (TRUE || Function()) and (Function() || TRUE) to be equivalent then? If so, the problem becomes a lot easier. – Pete Baughman Mar 07 '13 at 22:41
  • @PeteBaughman I would consider ((a==1)||(Function()==2)) to be equivalent with ((2==Function())||(1==a)). These functions that may or may not be part of a condition, have no side effects, they always would return with the same value given the same environment, they don't return booleans, but their return value is always tested against something (Function() != something). I know that the problem is not easy, but I also don't pretend to know or need an academic solution to it, where every complicated situation is covered. I need a solution to a very specific/small subset of the whole problem. – user1514631 Mar 08 '13 at 06:24

2 Answers2

5

Here's the thing, the problem may (or may not) be NP-complete but unless this is within the inner-loop of something important (and the number of variable are small), build the entire truth table! It's extremely easy to do. It obviously grows as 2^n, but for small n this is completely feasible. Like the comments suggest, I'm assuming that the function calls have no side effects and simply output True or False.

I've posted a mockup example that solves your stated problem, adapt as needed. I rely on pythons parser to handle the evaluation of the expression.

import pyparsing as pypar
import itertools

def python_equiv(s):
    return s.replace('||',' or ').replace('&&',' and ')

def substitute(s,truth_table, VARS):
    for v,t in zip(VARS,truth_table):
        s = s.replace(v,t)
    return s

def check_statements(A1,A2):  
    VARS = set()
    maths    = pypar.oneOf("( ! = | & )")
    keywords = pypar.oneOf("and or")
    variable = pypar.Word(pypar.alphanums)
    variable.setParseAction(lambda x: VARS.add(x[0]))
    grammar  = pypar.OneOrMore(maths | keywords | variable)

    # Determine the variable names
    grammar.parseString(A1)
    grammar.parseString(A2)

    A1 = python_equiv(A1)
    A2 = python_equiv(A2)

    bool_vals = map(str, [False,True])

    for table in itertools.product(bool_vals,repeat=len(VARS)):
        T1 = substitute(A1,table,VARS)
        T2 = substitute(A2,table,VARS)
        if eval(T1) != eval(T2):
            print "FAIL AT ", table,
            return False

    print "Statements equiv:",

    return True


# Original example
A1 = '''((op1 != v1) || ((op2 != v2) || (op3 != v3)))'''
A2 = '''((op2 != v2) ||  (op1 != v1) || (op3 != v3))'''
print check_statements(A1,A2)

# Example with a failure
A1 = '''((op1 != v1) || ((op2 != v2) || (op3 != v3)))'''
A2 = '''((op2 != v2) ||  (op1 != v1) && (op3 != v3))'''
print check_statements(A1,A2)

Gives as output:

Statements equiv: True
FAIL AT  ('False', 'False', 'False', 'False', 'False', 'True') False
Hooked
  • 84,485
  • 43
  • 192
  • 261
  • My personal opinion: in example code and such, do not use `from foo import *`, that makes it hard for others to know which used symbols are from which import. – hyde Mar 07 '13 at 05:09
  • @hyde Generally I agree, though with pyparsing it gets a bit much. For clarity I've added in the import paths. – Hooked Mar 07 '13 at 05:12
  • This helps you decide if the boolean formulas are equivalent, but not if they compute the same thing. Consider: x = P>Q ; if (x && y) ... ; x = sin(Z)<3; if (x && y) ... The two formulas are algebraically identical, but they do NOT compute the same thing. You need to prove the two conditionals are evaluated with the same control dependence. – Ira Baxter Mar 07 '13 at 06:02
  • @IraBaxter I see, I think I interpreted the question at a level that was too shallow. The OP stated "[Check if] statements ... are equivalent", and I saw that as check if "the boolean formulas are equivalent". I'll leave my posting up since I think it still has some value as a subset of the original problem. – Hooked Mar 07 '13 at 14:45
  • @Hooked, this is a cool idea, and thank you for bringing it up, but there can be up to 30 subexpressions in a condition and as you mentioned this approach would be way to slow. – user1514631 Mar 07 '13 at 16:29
  • @user1514631 Like I said in regards to IraBaxter, I don't think this fully answers the problem. I do want to point out it's not the number of _subexpressions_, but the number of _unique variables_, as that determines the size of the truth table. Small is relative too, I can easily run a simulation with 20 variables 2^20 ~= 100K in no time, but I would get stuck at 30! – Hooked Mar 07 '13 at 20:36
  • @Hooked, thank you for your help, with a few changes the code you've provided worked out pretty well (much better than previous attempts). – user1514631 Mar 12 '13 at 13:17
2

To do this, you need control flow anlaysis to determine if the two conditions have the same control dependence (otherwise they don't execute in the same data context), full data flow analysis including points-to analysis of the C code, a side-effect analysis of functions, the ability to backslice from the root of the condition to the leaves of the expression across function calls, and then a boolean equivalence matcher that accounts for C semantics (e.g. short-circuiting, overflows, ...)

This is far beyond what you get from a C parser.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • 1
    Well, question specifies "simple", which could be interpreted to mean subexpressions do not have side-effects etc... – hyde Mar 07 '13 at 05:11
  • If you leave out side effects, you still have to determine what functions are called. This requires points-to analysis, unless you are going to eliminate this too. If you simplify it enough, yes he can get a simple answer. My point: won't be much left. – Ira Baxter Mar 07 '13 at 05:59
  • Well yeah, this rules out a large portion of common C conditions, for example `(fp && -1 != (ch = fgetc(fp)))`, or anything where order matters. But if functions called in the condition expression have no side-effects (which is actually a reasonable self-imposed rule in C, for those who prefer functional programming), then it doesn't matter which get called. – hyde Mar 07 '13 at 07:55