32

I get a StackOverflowException when I run the following code:

private void MyButton_Click(object sender, EventArgs e) {
  MyButton_Click_Aux();
}

private static volatile int reportCount;

private static void MyButton_Click_Aux() {
  try { /*remove because stack overflows without*/ }
  finally {
    var myLogData = new ArrayList();
    myLogData.Add(reportCount);
    myLogData.Add("method MyButtonClickAux");
    Log(myLogData);
  }
}

private static void Log(object logData) {
  // my log code is not matter
}

What could be causing the StackOverflowException?

H H
  • 263,252
  • 30
  • 330
  • 514
PRASHANT P
  • 1,527
  • 1
  • 16
  • 28
  • 34
    "it must therefore be bug in the .net" Probably not. – jason Feb 02 '11 at 21:05
  • 8
    Please post the code for `MyButton_Click_Aux` – Dave Mateer Feb 02 '11 at 21:06
  • 7
    Please include code in MyButton_Click_Aux. – Andreas Vendel Feb 02 '11 at 21:06
  • 7
    By the way, I should point out that you can't catch a `StackOverflowException`. A `StackOverflowException` is a deadly situation as you've blown up the execution stack because of too many nested calls. It means your program is terminally ill, and should have the plug pulled. And that's exactly what the CLR does, it kills your process. – jason Feb 02 '11 at 21:35
  • 11
    -9? He's just asking for help! – Greg D Feb 02 '11 at 21:44
  • @PRASHANT P **The stack-trace of the exception will [likely] reveal exactly what causes the exception.** (It is likely because of unbound recursion.) As others have said, the chances of this being a bug in .NET are slim-to-none. It is [usually] never good to catch an exception without explicit knowledge of *why* is occurs and a justified reason for *correctly handling* the exception. This code shows neither. –  Feb 02 '11 at 22:14
  • i am going to reform question and edit but i will be taking time for improving translate – PRASHANT P Feb 02 '11 at 23:12
  • 5
    @Pershant Why not just post the code for `MyButton_Click_Aux();`? That'll tell us if it's causing the problem. – George Stocker Feb 02 '11 at 23:23
  • @Greg D: Asking for help is what this site's all about. But there's no reason for the OP to show what he's doing if he really wants people to provide feedback on it. – Jonathan Wood Feb 02 '11 at 23:52
  • 3
    @PRASHANT P Stop guessing. Post/analyze the stack-trace :) –  Feb 03 '11 at 05:08
  • Re the updated version: Is the Messagbox ever being shown? I can't believe this is real code because it would just show `System.Collections.ArrayList` in a popup window. – H H Feb 03 '11 at 14:36
  • @Henk - looking through the edit history, it looks like there's some information missing in the most recent edit, but I am not brave enough to even attempt to try and edit this myself! – John Rasch Feb 03 '11 at 14:52
  • @Henk Holterman john rash is not incorrect i had more explanations but some edits removed all – PRASHANT P Feb 03 '11 at 20:22
  • @prashant, the error is in code we can't see. The only hope would be something from a stacktrace. You've heard that from multiple users now. – H H Feb 03 '11 at 20:32
  • @Henk Holterman stack trace has no uses it has only .net privates try block is never runs – PRASHANT P Feb 03 '11 at 20:38
  • 4
    I'll give you this: George Stocker is over-editing everything in sight. Reviewing the edits, he's taking too much out. I assume he's trying to clarify the question be removing unneeded text but that is too much because he's removing some of the comments he feels do not help the question. This is not respecting the original question. Sheesh George, cool it. (How can I tell him?) – Jonathan Wood Feb 03 '11 at 20:41
  • 4
    @Jason, Henk, and many many others - I'm afraid you all need to eat some humble pie... – Andras Zoltan Feb 03 '11 at 21:04
  • 2
    @Andras: It turns out interesting but the call for legit and complete info was justified. – H H Feb 03 '11 at 21:16
  • 1
    @Henk - I've looked through the revision history of the question and, yes, it certainly didn't start out as a shining example of a good SO question. – Andras Zoltan Feb 03 '11 at 21:18
  • 1
    @Andras Zoltan: So grab yourself a piece. – jason Feb 03 '11 at 21:55
  • 1
    @Jason - LOL touché - It tastes so good – Andras Zoltan Feb 03 '11 at 22:06
  • So @Jason, Henk, and many many others - my apologies. Got caught up in the fact that we have one of those rarest of occasions where a bad question that seemed completely implausible actually ends up being a legitimate error in the runtime. In the end the SO model has won out and provided the OP with the answer - I suppose I just think the history on this occasion is somewhat painful reading :) – Andras Zoltan Feb 03 '11 at 22:30
  • @Andras Zoltan: Yeah, given the initial post, it really was a less than 0.01% chance this was a legitimate bug somewhere in the compiler/Framework/BCL/JIT compiler. I'm glad I hedged myself and only said "Probably not" and not "Definitely not." :-) Kudos to you for digging into it once we received sufficient information to actually look into the problem. – jason Feb 03 '11 at 22:33
  • @Jason - yep 'definitely' is one of those words that, ahem, definitely ends up biting us on the arse eventually! . Mine's like swiss cheese. – Andras Zoltan Feb 03 '11 at 22:39
  • @andras I think it is worth mentioning that meta played an important part too as the OP was quite dismayed and posted there re. this question. Quite a ride. http://meta.stackexchange.com/questions/77601/people-left-mean-comments-on-my-question-how-do-i-resolve-this/77605#77605 – Tim Lloyd Feb 03 '11 at 22:39
  • @chibacity - incredible. Proof-positive that the SO really does work (ouch that sounds like an advert for Head and Shoulders) - if you engage with it properly (and rinse well). I love this site, it's like a good joke - it works on so many levels. – Andras Zoltan Feb 03 '11 at 22:51
  • 1
    @andras if you're willing to put the effort in and stick with it, it's jolly good. Now here comes the science... :) – Tim Lloyd Feb 03 '11 at 22:56
  • @chibacity - ha ha - (last one promise) - `SO - because you're code's worth it` – Andras Zoltan Feb 03 '11 at 23:00
  • @chibacity - now that's a keeper :) – Andras Zoltan Feb 03 '11 at 23:19

4 Answers4

44

I know how to stop it from happening

I just don't know why it causes it (yet). And it would appear you have indeed found a bug either in the .Net BCL or, more likely, in the JIT.

I just commented out all the lines in the MyButton_Click_Aux method and then started bringing them back in, one by one.

Take off the volatile from the static int and you'll no longer get a StackOverflowException.

Now to research why... Clearly something to do with Memory Barriers is causing an issue - perhaps somehow forcing the MyButton_Click_Aux method to call itself...

UPDATE

Okay so other people are finding that .Net 3.5 is not an issue.

I'm using .Nt 4 as well so these comments relate to that:

As I said, take the volatile off and it works.

Equally, if you put the volatile back on and remove the try/finally it also works:

private static void MyButton_Click_Aux()
{
  //try { /*remove because stack overflows without*/ }
  //finally
  //{
    var myLogData = new ArrayList(); 
    myLogData.Add(reportCount); 
    //myLogData.Add("method MyButtonClickAux");
    //Log(myLogData);
  //}
}  

I also wondered if it was something to do with the uninitialised reportCount when the try/finally is in. But it makes no difference if you initialise it to zero.

I'm looking at the IL now - although it might require someone with some ASM chaps to get involved...

Final Update As I say, this really is going to require analysis of the JIT output to really understand what's happening and whilst I find it fun to analyse assembler - I feel it's probably a job for someone in Microsoft so this bug can actually be confirmed and fixed! That said - it appears to be a pretty narrow set of circumstances.

I've moved over to a release build to get rid of all the IL noise (nops etc) for analysis.

This has, however, had a complicating impact on the diagnosis. I thought I had it but didn't - but now I know what it is.

I tried this code:

private static void MyButton_Click_Aux()
{
  try { }
  finally
  {
    var myLogData = new ArrayList();
    Console.WriteLine(reportCount);
    //myLogData.Add("method MyButtonClickAux");
    //Log(myLogData);
  }
}

With the int as volatile. It runs without fault. Here's the IL:

.maxstack 1
L_0000: leave.s L_0015
L_0002: newobj instance void [mscorlib]System.Collections.ArrayList::.ctor()
L_0007: pop 
L_0008: volatile. 
L_000a: ldsfld int32 modreq([mscorlib]System.Runtime.CompilerServices.IsVolatile) WindowsFormsApplication1.Form1::reportCount
L_000f: call void [mscorlib]System.Console::WriteLine(int32)
L_0014: endfinally 
L_0015: ret 
.try L_0000 to L_0002 finally handler L_0002 to L_0015

Then we look at the minimum code required to get the error again:

private static void MyButton_Click_Aux()
{
  try { }
  finally
  {
    var myLogData = new ArrayList();
    myLogData.Add(reportCount);
  }
}

And it's IL:

.maxstack 2
.locals init (
    [0] class [mscorlib]System.Collections.ArrayList myLogData)
L_0000: leave.s L_001c
L_0002: newobj instance void [mscorlib]System.Collections.ArrayList::.ctor()
L_0007: stloc.0 
L_0008: ldloc.0 
L_0009: volatile. 
L_000b: ldsfld int32 modreq([mscorlib]System.Runtime.CompilerServices.IsVolatile) WindowsFormsApplication1.Form1::reportCount
L_0010: box int32
L_0015: callvirt instance int32 [mscorlib]System.Collections.ArrayList::Add(object)
L_001a: pop 
L_001b: endfinally 
L_001c: ret 
.try L_0000 to L_0002 finally handler L_0002 to L_001c

The difference? Well there's two that I spotted - boxing of the volatile int, and a virtual call. So I setup these two classes:

public class DoesNothingBase
{
  public void NonVirtualFooBox(object arg) { }
  public void NonVirtualFooNonBox(int arg) { }

  public virtual void FooBox(object arg) { }
  public virtual void FooNonBox(int arg) { }
}

public class DoesNothing : DoesNothingBase
{
  public override void FooBox(object arg) { }
  public override void FooNonBox(int arg) { }
}

And then tried each of these four versions of the offending method:

try { }
finally
{
  var doesNothing = new DoesNothing();
  doesNothing.FooNonBox(reportCount);
}

Which works.

try { }
finally
{
  var doesNothing = new DoesNothing();
  doesNothing.NonVirtualFooNonBox(reportCount);
}

Which also works.

try { }
finally
{
  var doesNothing = new DoesNothing();
  doesNothing.FooBox(reportCount);
}

Oops - StackOverflowException.

And:

try { }
finally
{
  var doesNothing = new DoesNothing();
  doesNothing.NonVirtualFooBox(reportCount);
}

Oops again! StackOverflowException!

We could go further with this - but the issue is, I feel, clearly caused by the boxing of the volatile int whilst inside the finally block of a try/catch... I put the code inside the try, and no problem. I added a catch clause (and put the code in there), also no problem.

It could also apply to the boxing of other value types I guess.

So, to summarise - in .Net 4.0 - in both debug and release builds - the boxing of a volatile int in a finally block appears to cause the JIT to generate code that ends up filling the stack. The fact that the stack trace simply shows 'external code' also supports this proposition.

There's even a possibility that it can't always be reproduced and might even depend on the layout and size of the code that is generated by the try/finally. It's clearly something to do with an errant jmp or something similar being generated to the wrong location which eventually repeats one or more push commands to the stack. The idea that that is being caused actually by a box operation is, frankly, fascinating!

Final Final Update

If you look at the MS Connect bug that @Hasty G found (answer further down) - you see there that the bug manifests in a similar fashion, but with a volatile bool in a catch statement.

Also - MS queued a fix for this after getting it to repro - but no hotfix available yet after 7 months. I've gone on record before as being in support of MS Connect, so I'll say no more - I don't think I need to!

Final Final Final Update (23/02/2011)

It is fixed - but not yet released. Quote from the MS Team on the MS Connect bug:

Yes, it's fixed. We're in the process of figuring out how best to ship a fix. It is already fixed in 4.5, but we'd really like to fix a batch of code generation bugs prior to 4.5 release.

Andras Zoltan
  • 41,961
  • 13
  • 104
  • 160
  • @Andras Zoltan i have not considering volatile keyword it was formerly in use with many threads but has no longer any purpose so i can remove very strange problem – PRASHANT P Feb 03 '11 at 21:06
  • @PRASHANT P - there's more. I've also removed the try\finally from around the block and left the volatile on - and hey presto - no more StackOverflow... – Andras Zoltan Feb 03 '11 at 21:08
  • Yeah, I noticed that removing the try/finally got rid of it. I didn't suggest that as I figured there was a reason you had that in there in the first place. – dotalchemy Feb 03 '11 at 21:10
  • Heh, you beat me to the volatile thing :) +1 for you :p – dotalchemy Feb 03 '11 at 21:11
  • @dotalchemy - got it - it's to do with boxing the volatile int. – Andras Zoltan Feb 03 '11 at 21:56
  • 2
    @Andras: Good job. (Apparently, you have a little more free time today than I do :) – Jonathan Wood Feb 04 '11 at 00:13
  • 1
    @Jonathan Wood - thanks. If only that were it, though; my work day finished 6 hours ago - it's Midnight Friday here in the UK! This was some light relief from unpaid overtime :) Do I love the job anyway, though? You bet. – Andras Zoltan Feb 04 '11 at 00:25
12

The bug is in your code. Presumably, MyButton_Click_Aux() causes some method to be re-entered. However, you've inexplicably omitted that code from your question and so no one can comment on it.

George Stocker
  • 57,289
  • 29
  • 176
  • 237
Jonathan Wood
  • 65,341
  • 71
  • 269
  • 466
  • @Jonathan Wood i have no re-entry i know of recursion i have much c coding experience this is the forms only i think must be bug – PRASHANT P Feb 03 '11 at 20:27
  • @PRASHANT: No, you have not established that. I'm glad you've updated the code in your question but what is in your `try` block that isn't shown? Do I understand from the comment that the missing code is what causes the overflow? At any rate, Visual Studio has a wonderful debugger. Step through the code. You can then determine for sure there is no re-entry, and if not you can see exactly where the error occurs. – Jonathan Wood Feb 03 '11 at 20:34
  • @Johnathan Wood i have done this code in try does not matters i have only simply form with 1 button now and stack trace has only .net privates – PRASHANT P Feb 03 '11 at 20:36
  • 4
    @Jonathan Wood - I think an apology is in order to @PRASHANT P - he has indeed found a very very interesting bug... – Andras Zoltan Feb 03 '11 at 21:04
  • @Andras: Well, my problem was largely to do with poor communication and the lack of information provided in the original post. For these things, I see no reason to apologize. In short, he was saying it was a bug when he had not established that. Of course, that doesn't mean it can't be a bug--and I'll accept that once it has been explored a little. But the original post was missing critical information. (BTW, I only have C# 2008 Express Edittion here on my laptop and the code works fine.) – Jonathan Wood Feb 03 '11 at 21:10
  • 1
    Agreeing with JW - I considered downvoting this answer, but then I realised it was to do with the distinct lack of information originally provided. I don't think any foul has been committed here. – dotalchemy Feb 03 '11 at 21:13
  • @dotalchemy and @jonathan - yes I also considered downvoting but for the same reason have not. – Andras Zoltan Feb 03 '11 at 21:16
  • @Andras Zoltan: Have to disagree with owing him an apology; the initial post was terrible and gave everyone every reason to believe it was the OP and not Microsoft to blame. Agree it is a very interesting bug. – jason Feb 03 '11 at 22:36
  • @Jason - yes you're right I do agree as well, I succumbed to dark side there for a moment. Even I'm starting to wonder now if an answer is going to be accepted now that's been solved... doesn't have to be mine - just let's get the question closed and mark it down in SO history. 'Lessons to be learned' by all, certainly - but everyone lives happily ever after. Well, until the next dodgy question anyway :) – Andras Zoltan Feb 03 '11 at 22:45
  • @Andras Zoltan i am not in having dodgy question,, i am try to increase in quality.. i read all even with no understand – PRASHANT P Feb 23 '11 at 23:18
  • 1
    @PRASHANT P - Sorry for any offence - I have been keeping up with your questions since this one and can see that you are trying harder, and appreciate that it's also hard if English isn't your first language - keep up the good work, and the community here will look after you :) – Andras Zoltan Feb 24 '11 at 00:02
0

Does Log call back to Log? This would also cause a SO.

Mark Avenius
  • 13,679
  • 6
  • 42
  • 50
  • @Mark Avenius this is obvious i have edit and am removing all code from Log it does not even get the call – PRASHANT P Feb 03 '11 at 20:33
  • @Prashant: it may seem obvious, but it is nonetheless a place where a SO could occur, so I mentioned it because I cannot see what `Log` is doing. Just trying to help... – Mark Avenius Feb 03 '11 at 20:45
  • 2
    @Mark Avenius i am sorry i have receive much sarcastic in many responses i do never intend insult – PRASHANT P Feb 03 '11 at 21:00
  • 3
    @PRASHANT P - you have nothing to apologise for - I'm afraid apart from myself and John you were treated very harshly on this occasion. Shame on such SO heavyweights... – Andras Zoltan Feb 03 '11 at 21:09
  • 1
    @Andras: I agree. @Prashant, you have done nothing wrong here. – Mark Avenius Feb 03 '11 at 21:10
  • @Mark Avenius - that said - you didn't deserve a smackdown either as you were only trying to help!. My my, I am often amazed by how SO makes the emotions flow! – Andras Zoltan Feb 03 '11 at 22:33
0

When that exception happens, why not check what was recorded in call stack panel? The call stack itself can tell a lot.

Besides, low level debugging using SOS.dll and WinDbg can also tell you a lot.

Lex Li
  • 60,503
  • 9
  • 116
  • 147