4

Are there any C++ code parsers that look for boolean expressions that can be simplified using boolean algebra?

I know that compilers do this already, but it would be nice to have a tool giving out such things so that one can actually improve the code readability.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
blubberbernd
  • 3,641
  • 8
  • 35
  • 46
  • 1
    Can you give an example? – Lightness Races in Orbit Sep 11 '11 at 14:53
  • do you mean something that will change: (A&&B)||(A&&C) to (A&&(B||C)) ? – Roee Gavirel Sep 11 '11 at 14:55
  • 1
    @Tomalak For example (!(!boolOne || boolTwo)) can be changed to (boolOne && !boolTwo). The second case is much more intuitive and has one boolean operation less (the "global" negation is gone). To all others: I don't see any reason why there shouldn't be tools to *suggest* these kind of things. It's still up to the human to decide if the *tool-suggested* optimisation makes more sense and/or is more readable. – blubberbernd Sep 11 '11 at 15:04
  • 3
    @Roee: If A or B has a side effect, such a transformation isn't safe. – Ira Baxter Sep 11 '11 at 15:33
  • @Ira: I was just trying to see what he meant, I didn't said it's good (: – Roee Gavirel Sep 12 '11 at 07:08

6 Answers6

3

Humans.

You want to improve readability, and since readability is mostly a human thing it should be taught by a human.

Ask more experienced developers to review your expressions and give tips.

For example, see my answer here: What is the best way (performance-wise) to test whether a value falls within a threshold?

Community
  • 1
  • 1
orlp
  • 112,504
  • 36
  • 218
  • 315
0

Although it doesn't work directly on C/C++ boolean expressions, one tool I've found very useful for simplifying complex boolean logic is Logic Friday. Unfortunately it's Windows-only, but fortunately it's free.

Paul R
  • 208,748
  • 37
  • 389
  • 560
0

You can make the code more efficient by reducing the number of "if"s its need to check. but more simplified and better readability can't be made automatically.

Roee Gavirel
  • 18,955
  • 12
  • 67
  • 94
0

http://www.freewarepalm.com/educational/booleanfunctionsimplificationtool.shtml

might be worth a try.

However it is usually better to do you for yourself - as readability and understandability is more important - and let the compiler do the simplification.

Ed Heal
  • 59,252
  • 17
  • 87
  • 127
0

This is such a bad idea! Programmers write code that reflects their thought processes. So boolean expressions as written by humans are already automatically optimised for human comprehension. Any attempt to improve on this programatically is doomed to failure. The only context in which it might make sense is post-processing of tool-generated source code.

TonyK
  • 16,761
  • 4
  • 37
  • 72
  • But in very large open source code, where a lot of people with every possible experience levels have written, there's a chance that some expressions are overly complex. Double negations, for example. Nobody talked about *autoamtic* conversions. Nobody talked about *forcing* new boolean expressions. All I was asking for a tool that finds *possibly* inefficient or overly complex expressions and let's *me* think about it then. – blubberbernd Sep 11 '11 at 15:32
  • 2
    @TonyK: Programmers often write code that _poorly_ reflects their thought processes. – Lightness Races in Orbit Sep 11 '11 at 16:51
  • @Tomalak: Speak for yourself! – TonyK Sep 11 '11 at 18:33
  • @TonyK: I'm speaking from years of experience in fixing code that others have written. – Lightness Races in Orbit Sep 11 '11 at 18:36
  • @Tomalak: But if you have years of experience, like me, don't you agree that an automated boolean-expression elucidator is the last thing that you need? – TonyK Sep 11 '11 at 18:59
  • @TonyK: Absolutely! I never disputed that. I merely dispute your assertion that humans are somehow infallible. – Lightness Races in Orbit Sep 11 '11 at 19:00
  • @Tomalak: Let me put it this way: if you had an automated boolean-expression elucidator, would it ever -- in all your years of experience -- have been able to help you in your work? – TonyK Sep 11 '11 at 19:23
  • @TonyK: I cannot answer that question. I can only say that humans do not always write code that strictly reflects their thought processes. To assert such a thing is to assert that we never make mistakes in our expression. – Lightness Races in Orbit Sep 12 '11 at 00:48
  • I can speak from experience. While a "boolean cleanup" might marginally help code written by hand (esp. those involving NOT applied across AND and OR), some conditional code is built by machines by composing other formulas. In this case boolean simplification can be an enormous win. We did one code generator that produced raw boolean expressions with hundreds of terms, that the simplifier often reduced in size by an order of magnitude. If it overflows the page, a human being hasn't got a chance of understanding it; if it fits in a few lines, he has a chance, and its often sensible. – Ira Baxter Sep 12 '11 at 17:35
  • @Ira: Yes. As I wrote in my reply: "The only context in which it might make sense is post-processing of tool-generated source code." – TonyK Sep 13 '11 at 16:35
0

What you need is a tool that can parse C++, determine the meaning of its symbols, pick out boolean equations, and apply boolean simplification rules to them that don't violate the semantics.

A tool that can do this is our DMS Software Reengineering Toolkit with its C++ Front End. DMS is designed to carry out program analyses and source-to-source transformations on code. Using the C++ Front End, it can parse C++ to ASTs, build up symbol tables, and infer the type of an expression, and apply rewrite rules.

One can code rewrite rules like this:

domain Cpp.  -- tell DMS to use the C++ front end

rule factor_common_and_term(e1: condition, e2:condition, e3: condition):
        disjunctive_expression -> disjunctive_expression =
 " \e1 && \e2 ||  \e1 && \e3 " ->  " \e1 && ( \e2 || \e3 ) "
 if no_side_effects(e1) /\ no_side_effects(e2);

to factor out a common condition. The rule has name "factor_common_and_term" to distinguish it from the often hundreds of other rules one might write (e.g., "distribute_term", etc.). The e1,e2,e3 are metavariables representing arbitrary subexpressions (of type "condition" according to the grammar rules). The rewrite operates only on disjunction_expressions; you could make this be "just expression" but then you would not get disjunctions nested inside other conditional expressions. The rewrite has a pattern (left) and a replacement (right), both wrapped in meta-quotes " to distinguish the C++ code in the patterns from the rule-language syntax surrounding it. The \e1 are escapes from C++ syntax, and indicate where a metavariable can match. Metavariables match any syntax of the corresponding category, so where \e1 is seen can be an arbitrarily complicated "condition". The fact that e1 is mentioned twice in the pattern forces the occurences to be identical.

One can write a set of rewrite rules that encode knowledge about simplifying arbitrarily complex boolean equations; a few dozen rules sort of does it. We've applied these to systems of non-C++ boolean equations with hundreds of thousands of terms, and to C and C++ prepreprocessor conditionals.

For C++, you need a check that the rewrite is safe to do, which it is not if e1 has a side effect, or e2 has a side effect. This check is made with an auxiliary function call that has to determine this answer in a conservative way. The determination that there is no side effect is in fact pretty complex for C++: you have to know what all the elements of the expression are, and that none of them have side effects.

One can do this check with DMS's attribute grammar (an organized tree crawl) that inspects all expression elements. Simple constants and variables (need a symbol table for this) do not. Function calls (including constructors, etc.) may; their definition has to be found (again the need for a symbol table), and processed similarly. It is possible that an expression element calls a separately compiled function; the conservative answer in this case is "don't know" therefore "assume has side effect". DMS can actually read multiple compilation units at the same time, so a separately compiled function can be found, parse/symbol-resolved, and crawled if you want to go that far.

So the boolean rewrite part is pretty easy; the side effect analysis is not.

We've used DMS to carry out massive changes on C++ code; we often cheat a bit by making assumptions about complex analyses like this. Usually we get suprised the same ways programmers get surprised ("What do you mean, that has a side effect?"). Mostly it works pretty well. We have done side-effect analysis in detail on C systems of 25 million lines; not quite there for C++ yet.

The side effect analysis only matters if some subexpression might be evaluated more than once. OP's example, given in a comment, doesn't need them, and can be handled by the following rules:

rule not_on_disjunction(e1:condition, e2:condition):
    condition -> condition =
  " ! (\e1 || \e2) " ->  " !\e1 && !\e2";

rule double_not(e:condition):
    condition -> condition =
  " ! ! \e " -->  " \e ";

A complete, but simple worked example with more detailed description is this example of algebraic simplification of conventional algebra and some calculus.

There's clearly controversy as to whether a particular code transformation will make code more readable. IMHO, that's because the shape of code is often an art judgement, and we all seem to disagree about art. This isn't any different than letting somebody else modify your code.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341