Does creating new Processes help me for Traversing a big tree?

Question

Let's think of it as a family tree, a father has kids, those kids have kids, those kids have kids, etc... So I have a recursive function that gets the father uses Recursion to get the children and for now just print them to debug output window...But at some point ( after one hour of letting it run and printing like 26000 rows) it gives me a StackOverFlowException.

So Am really running out of memory? hmmm? then shouldn't I get an "Out of memory exception"? on other posts I found people were saying if the number of recursive calls are too much, you might still get a SOF exception...

Anyway, my first thought was to break the tree into smaller sub-strees..so I know for a fact that my root father always has these five kids, so Instead of Calling my method one time with root passed to it, I said ok call it five times with Kids of root Passes to it.. It helped I think..but still one of them is so big - 26000 rows when it crashes - and still have this issue.

How about Application Domains and Creating new Processes at run time at some certain level of depth? Does that help?

How about creating my own Stack and using that instead of recursive methods? does that help?

here is also a high-level of my code, please take a look, maybe there is actually something silly wrong with this that causes SOF error:

private void MyLoadMethod(string conceptCKI)
{

// make some script calls to DB, so that moTargetConceptList2 will have Concept-Relations for the current node. 

            // when this is zero, it means its a leaf. 
            int numberofKids = moTargetConceptList2.ConceptReltns.Count();
            if (numberofKids == 0)
                return;
            for (int i = 1; i <= numberofKids; i++)
            {
                oUCMRConceptReltn = moTargetConceptList2.ConceptReltns.get_ItemByIndex(i, false);

                //Get the concept linked to the relation concept
                if (oUCMRConceptReltn.SourceCKI == sConceptCKI)
                {
                    oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.TargetCKI, false);
                }
                else
                {
                    oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.SourceCKI, false);
                }

                //builder.AppendLine("\t" + oConcept.PrimaryCTerm.SourceString);
                Debug.WriteLine(oConcept.PrimaryCTerm.SourceString);

                MyLoadMethod(oConcept.ConceptCKI);
            }
        }

Mark Byers · Answer 1 · 2012-08-11T21:39:58.003

10

How about creating my own Stack and using that instead of recursive methods? does that help?

Yes!

When you instantiate a Stack<T> this will live on the heap and can grow arbitrarily large (until you run out of addressable memory).

If you use recursion you use the call stack. The call stack is much smaller than the heap. The default is 1 MB of call stack space per thread. Note this can be changed, but it's not advisable.

edited Aug 11 '12 at 21:39

answered Aug 11 '12 at 21:31

Mark Byers

811,555
193
1,581
1,452

Thanks, I didn't know about 1MB limit?..Could you please show me a quick high-level example? I can research it myself too but if now that you know what is my question so you better know what we need to do..thanks – Bohn Aug 11 '12 at 21:33
7

@BDotA Use the built in C# data-type Stack. Start by pushing the root of your tree onto the stack. Then, do a while loop (while the stack still contains items). If the item at the top of the stack has children, push them all onto the stack. If the item at the top of the stack contains zero children, pop it off. Continue until the stack contains 0 items (all items have been processed/counted) – Pete Baughman Aug 11 '12 at 21:37

Jon Hanna · Accepted Answer · 2012-08-12T15:46:57.770

StackOverflowException is quite different to OutOfMemoryException.

OOME means that there is no memory available to the process at all. This could be upon trying to create a new thread with a new stack, or in trying to create a new object on the heap (and a few other cases).

SOE means that the thread's stack - by default 1M, though it can be set differently in thread creation or if the executable has a different default; hence ASP.NET threads have 256k as a default rather than 1M - was exhausted. This could be upon calling a method, or allocating a local.

When you call a function (method or property), the arguments of the call are placed on the stack, the address the function should return to when it returns are put on the stack, then execution jumps to the function called. Then some locals will be placed on the stack. Some more may be placed on it as the function continues to execute. stackalloc will also explicitly use some stack space where otherwise heap allocation would be used.

Then it calls another function, and the same happens again. Then that function returns, and execution jumps back to the stored return address, and the pointer within the stack moves back up (no need to clean up the values placed on the stack, they're just ignored now) and that space is available again.

If you use up that 1M of space, you get a StackOverflowException. Because 1M (or even 256k) is a large amount of memory for these such use (we don't put really large objects in the stack) the three things that are likely to cause an SOE are:

Someone thought it would be a good idea to optimise by using stackalloc when it wasn't, and they used up that 1M fast.
Someone thought it would be a good idea to optimise by creating a thread with a smaller than usual stack when it wasn't, and they use up that tiny stack.
A recursive (whether directly or through several steps) call falls into an infinite loop.
It wasn't quite infinite, but it was large enough.

You've got case 4. 1 and 2 are quite rare (and you need to be quite deliberate to risk them). Case 3 is by far the most common, and indicates a bug in that the recursion shouldn't be infinite, but a mistake means it is.

Ironically, in this case you should be glad you took the recursive approach rather than iterative - the SOE reveals the bug and where it is, while with an iterative approach you'd probably have an infinite loop bringing everything to a halt, and that can be harder to find.

Now for case 4, we've got two options. In the very very rare cases where we've got just slightly too many calls, we can run it on a thread with a larger stack. This doesn't apply to you.

Instead, you need to change from a recursive approach to an iterative one. Most of the time, this isn't very hard thought it can be fiddly. Instead of calling itself again, the method uses a loop. For example, consider the classic teaching-example of a factorial method:

private static int Fac(int n)
{
  return n <= 1 ? 1 : n * Fac(n - 1);
}

Instead of using recursion we loop in the same method:

private static int Fac(int n)
{
  int ret = 1;
  for(int i = 1; i <= n, ++i)
    ret *= i;
  return ret;
}

You can see why there's less stack space here. The iterative version will also be faster 99% of the time. Now, imagine we accidentally call Fac(n) in the first, and leave out the ++i in the second - the equivalent bug in each, and it causes an SOE in the first and a program that never stops in the second.

For the sort of code you're talking about, where you keep producing more and more results as you go based on previous results, you can place the results you've got in a data-structure (Queue<T> and Stack<T> both serve well for a lot of cases) so the code becomes something like):

private void MyLoadMethod(string firstConceptCKI)
{
  Queue<string> pendingItems = new Queue<string>();
  pendingItems.Enqueue(firstConceptCKI);
  while(pendingItems.Count != 0)
  {
    string conceptCKI = pendingItems.Dequeue();
    // make some script calls to DB, so that moTargetConceptList2 will have Concept-Relations for the current node. 
    // when this is zero, it means its a leaf. 
    int numberofKids = moTargetConceptList2.ConceptReltns.Count();
    for (int i = 1; i <= numberofKids; i++)
    {
        oUCMRConceptReltn = moTargetConceptList2.ConceptReltns.get_ItemByIndex(i, false);

        //Get the concept linked to the relation concept
        if (oUCMRConceptReltn.SourceCKI == sConceptCKI)
        {
            oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.TargetCKI, false);
        }
        else
        {
            oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.SourceCKI, false);
        }

        //builder.AppendLine("\t" + oConcept.PrimaryCTerm.SourceString);
        Debug.WriteLine(oConcept.PrimaryCTerm.SourceString);

        pendingItems.Enque(oConcept.ConceptCKI);
    }
  }
}

(I haven't completely checked this, just added the queuing instead of recursing to the code in your question).

This should then do more or less the same as your code, but iteratively. Hopefully that means it'll work. Note that there is a possible infinite loop in this code if the data you are retrieving has a loop. In that case this code will throw an exception when it fills the queue with far too much stuff to cope. You can either debug the source data, or use a HashSet to avoid enqueuing items that have already been processed.

Edit: Better add how to use a HashSet to catch duplicates. First set up a HashSet, this could just be:

HashSet<string> seen = new HashSet<string>();

Or if the strings are used case-insensitively, you'd be better with:

HashSet<string> seen = new HashSet<string>(StringComparison.InvariantCultureIgnoreCase) // or StringComparison.CurrentCultureIgnoreCase if that's closer to how the string is used in the rest of the code.

Then before you go to use the string (or perhaps before you go to add it to the queue, you have one of the following:

If duplicate strings shouldn't happen:

if(!seen.Add(conceptCKI))
  throw new InvalidOperationException("Attempt to use \" + conceptCKI + "\" which was already seen.");

Or if duplicate strings are valid, and we just want to skip performing the second call:

if(!seen.Add(conceptCKI))
  continue;//skip rest of loop, and move on to the next one.

wow! Thanks Sir for all the time you spent on this question. I am gonna read it very carefully, hope it solves my frustrating problem. Thanks. — Bohn, Aug 11 '12 at 23:21
quick question: Do I still this: " if (numberofKids == 0) return; " I had written it in my recursive method so it goes out when it sees a leaf with no more children. — Bohn, Aug 11 '12 at 23:31
Ooops. No, since you've only one function call, that should just skip the loop, which the loop conditional will do anyway (hence making it redundant in the original too). — Jon Hanna, Aug 11 '12 at 23:33

Vsevolod · Answer 3 · 2012-08-11T22:57:01.677

2

I think you have a recursion's ring (infinite recursion), not a really stack overflow error. If you are got more memory for stack - you will get the overflow error too.

For test it:

Declare a global variable for storing a operable objects:

private Dictionary<int,object> _operableIds = new Dictionary<int,object>();

...
private void Start()
{
    _operableIds.Clear();
    Recurtion(start_id);
}
...
private void Recurtion(int object_id)
{
   if(_operableIds.ContainsKey(object_id))
      throw new Exception("Have a ring!");
   else
     _operableIds.Add(object_id, null/*or object*/);

   ...
   Recurtion(other_id)
   ...
   _operableIds.Remove(object_id);
}

edited Aug 11 '12 at 22:57

answered Aug 11 '12 at 21:55

Vsevolod

365
4
11

recursion ring? what is it? in VisualStudio it stops and shows a StackOverFlow exception dialog – Bohn Aug 11 '12 at 22:07
1

I mean that in your data you has something like "John has 2 childs - Tony and Blare, and Tony has a child John and John has 2 childs - ..." – Vsevolod Aug 11 '12 at 22:10
hmm...intresting...but in the MyLoadMethod() I posted above, the ID of the kids I am passing to the method, they are unique...each node has a unique key I am using...so do you think this still happens? – Bohn Aug 11 '12 at 22:25
1

Infinite recursion is indeed really a stackoverflow error. It's the largest cause of infinite recursion by a massive margin. – Jon Hanna Aug 11 '12 at 22:33
Just test it - add to global `var operableIds = Dictionary`- before call recursion's method and remove after. And if we have a ring when we add a object id into `operableIds` it is in dictionary. – Vsevolod Aug 11 '12 at 22:34
well the first bad thing I see is that your _operableIDs variable is Not local to the recursion method. That will cause issues. The variables we use in a recursion method should be in the scope of that method, not out of it, that will cause unpredictable behavior. – Bohn Aug 11 '12 at 22:50
How in local variable we can add a value from recursion? That's impossible! Wrap by class if you worried about this. – Vsevolod Aug 11 '12 at 22:53

Does creating new Processes help me for Traversing a big tree?

3 Answers3