which is more important, number of variables or subexpressions?

Question

I presume the technique detecting shared expressions is applied on most of modern SMT solvers. The performance should be very good when it processes a sequence of similar expressions. However, I got unexpected results after I run Z3 on input1 and input2. Instead of build a long constraint A in "input1", some intermediate variables are defined to map to the sub-expressions of A in "input2". In that case, input1 has less variables, which should be solved faster than input2. I cannot find useful information from the statistic as they are exactly same except the solving time and memory consumed:

z3 statistic

I would very much appreciate if someone can answer/explain what affects the performance of the SMT solvers more, the number of variables or number of subexpressions?

There is no generally valid answer to this. The heuristics may get lucky on a file with more variables, or they can get unlucky on a file with fewer. What happens on this particular file, I can't tell, because I can't access your input files. Could you make those publicly available? — Christoph Wintersteiger, May 29 '14 at 12:40
Thank you for your reminder! They are public now. I should not forget sharing them. Thanks in advance for your further help! — Tianhai Liu, May 29 '14 at 13:42

score 2 · Accepted Answer · answered May 29 '14 at 17:08

2

I've done some profiling, and it seems that both inputs behave exactly the same in the solver. All (check-sat) commands take exactly the same time. Note that input 2 is a file of size 255KB, but input1 is a file of size 240MB, i.e., this file is about 1000 times larger than the first one. According to my profiler, all of the additional time required to solve these queries is spent in the parser. So, it simply takes a long time to read and check the input; the actual queries are all easy.

answered May 29 '14 at 17:08

Christoph Wintersteiger

8,234
1
16
30

Thank you Christoph. That makes sense! May I assume that detecting shared expressions is the preprocessor which costs most of the additional time when checking the input1? – Tianhai Liu May 29 '14 at 17:54
Again, can I assume that z3 create temporary variables mapping to the shared expressions in preprocessing steps? In other words, there exits same number of variables for input1 and input2 before invoking the actual solver? – Tianhai Liu May 29 '14 at 18:24
1

Z3 uses hash consing to detect structurally equal expressions, therefore structurally equal expressions are only created once. It's true that this may impose some time overhead during construction. Z3 doesn't explicitly created new variables for every subexpression, but if you prefer, you can think of it like that (actually they are just pointers in memory). – Christoph Wintersteiger May 30 '14 at 11:57
Thanks for your answer! Allow me to ask a naive question, structurally detection of equal expressions happens in the parser of z3, or in the solving engine? – Tianhai Liu May 30 '14 at 18:43
1

It happens whenever expressions are created, this could be while the parser is running, or when any of the Z3_mk_* functions are called through the API. – Christoph Wintersteiger May 31 '14 at 17:32
Is there a way to print an expression so that common subexpressions are shown? From the previous answer, I would expect that `f` is detected to be shared (the output is `(bvadd (bvor (bvadd x y) z) (bvand (bvadd x y) z))`): `expr x = ctx.bv_const("x", 32); expr y = ctx.bv_const("y", 32); expr z = ctx.bv_const("z", 32); expr f = x + y; expr g = (f | z) + (f & z); cout << g << endl;` Thanks! – user882903 Dec 27 '17 at 19:32
Yes, Z3 automatically introduces let-expressions in the output for shared sub-expressions and it will automatically detect structurally equal sub-expressions in the input. The pretty-printer has a number of configuration settings to enable/disable all/some/etc, see the options that `z3.exe -pm:pp` lists. – Christoph Wintersteiger Dec 30 '17 at 15:53

which is more important, number of variables or subexpressions?

1 Answers1