13

There is a strange bug that has been in Mathematica for years, at least since version 5.1, and persisting through version 7.


Module[{f, L}, L = f[];
  Do[L = f[L, i], {i, 10^4}]] // Timing
  {0.015, Null}
Module[{weirdness, L}, L = weirdness[];
  Do[L = weirdness[L, i], {i, 10^4}]] // Timing
  {2.266, Null}

  • What causes this? Is it a hashing problem?

  • Is it fixed in Version 8?

  • Is there a way to know what symbol names cause a slowdown, other than testing?

Mr.Wizard
  • 24,179
  • 5
  • 44
  • 125

2 Answers2

19

What causes this? Is it a hashing problem?

Yes, more or less.

Is it fixed in Version 8?

Yes (also more or less). That is to say, it is not possible to fix in any "complete" sense. But the most common cases are much better handled.

Is there a way to know what symbol names cause a slowdown, other than testing?

No way of which I am aware.

In version 7 there is an earlier fix of a similar nature to the one in version 8. It was off by default (we'd not had adequate time to test it when we shipped, and it did not get turned on for version 7.0.1). It can be accessed as follows.

SetSystemOptions["NeedNotReevaluateOptions"->{"UseSymbolLists"->True}];

This brings your example back to the realm of the reasonable.

Module[{weirdness, L}, L = weirdness[];
  Do[L = weirdness[L, i], {i, 10^4}]] // Timing

Out[8]= {0.020997, Null}

---edit---

I can explain the optimization involved here in slightly more detail. First recall that Mathematica emulates "infinite evaluation", that is, expressions keep evaluating until they no longer change. This can be costly and hence requires careful short circuit optimizations to forestall it when possible.

A mechanism we use is a variant of hashing, that serves to indicate that symbols on which an expression might depend are unchanged and hence that expression is unchanged. It is here that collisions might occur, thus necessitating more work.

In a bad case, the Mathematica kernel might need to walk the entire expression in order to determine that it is unchanged. This walk can be as costly as reevaluation. An optimization, new to version 7 (noted above), is to record explicitly, for some types of expression, those symbols upon which it depends. Then the reevaluation check can be shortened by simply checking that none of these symbols has been changed since the last time the expression was evaluated.

The implementation details are a bit involved (and also a bit proprietary, though perhaps not so hard to guess). But that, in brief, is what is going on under the hood. Earlier versions sometimes did significant expression traversal just to discover that the expression needed no reevaluation. This can still happen, but it is a much more rare event now.

---end edit---

Daniel Lichtblau Wolfram Research

Mr.Wizard
  • 24,179
  • 5
  • 44
  • 125
Daniel Lichtblau
  • 6,854
  • 1
  • 23
  • 30
  • 1
    Could you be more precise about the cause of the bug? – Alexey Popkov Mar 23 '11 at 17:25
  • It would take a bit of time and effort to describe the workings that lead to this behavior. I may do that if time permits (which seems unlikely). But whatever it is, it is not a bug. In particular, the word "fix" was a bad choice on my part. – Daniel Lichtblau Mar 23 '11 at 17:46
  • 1
    It would be valuable if you explain the mechanism at least in general. What shatters the mystic is always valuable… – Alexey Popkov Mar 24 '11 at 00:06
10

As to version 8: I tried 100,000 random strings of various lengths and didn't find anything out of the ordinary.

chars = StringCases[CharacterRange["A", "z"], WordCharacter] //Flatten;
res = Table[
       ToExpression[
        StringReplace[
         "First[AbsoluteTiming[Module[{weirdness,L},L=weirdness[];\
    \[IndentingNewLine]Do[L=weirdness[L,i],{i,10^4}]]]]", 
         "weirdness" -> 
          StringJoin[
           RandomChoice[chars, RandomInteger[{1, 20}]]]]], {100000}];

enter image description here

Sjoerd C. de Vries
  • 16,122
  • 3
  • 42
  • 94