Early finalization and memory leaks in C++/CLI library

Question

I'm having issues with finalizers seemingly being called early in a C++/CLI (and C#) project I'm working on. This seems to be a very complex problem and I'm going to be mentioning a lot of different classes and types from the code. Fortunately it's open source, and you can follow along here: Pstsdk.Net (mercurial repository) I've also tried linking directly to the file browser where appropriate, so you can view the code as you read. Most of the code we deal with is in the pstsdk.mcpp folder of the repository.

The code right now is in a fairly hideous state (I'm working on that), and the current version of the code I'm working on is in the Finalization fixes (UNSTABLE!) branch. There are two changesets in that branch, and to understand my long-winded question, we'll need to deal with both. (changesets: ee6a002df36f and a12e9f5ea9fe)

For some background, this project is a C++/CLI wrapper of an unmanaged library written in C++. I am not the coordinator of the project, and there are several design decisions that I disagree with, as I'm sure many of you who look at the code will, but I digress. We wrap much of the layers of original library in the C++/CLI dll, but expose the easy-to-use API in the C# dll. This is done because the intention of the project is to convert the entire library to managed C# code.

If you're able to get the code to compile, you can use this test code to reproduce the problem.

The problem

The latest changeset, entitled moved resource management code to finalizers, to show bug, shows the original problem I was having. Every class in this code is uses the same pattern to free the unmanaged resources. Here is an example (C++/CLI):

DBContext::~DBContext()
{
    this->!DBContext();
    GC::SuppressFinalize(this);
}

DBContext::!DBContext()
{
    if(_pst.get() != nullptr)
        _pst.reset();            // _pst is a clr_scoped_ptr (managed type)
                                 // that wraps a shared_ptr<T>.
}

This code has two benefits. First, when a class such as this is in a using statement, the resources are properly freed immediately. Secondly, if a dispose is forgotten by the user, when the GC finally decides to finalize the class, the unmanaged resources will be freed.

Here is the problem with this approach, that I simply cannot get my head around, is that occasionally, the GC will decide to finalize some of the classes that are used to enumerate over data in the file. This happens with many different PST files, and I've been able to determine it has something to do with the Finalize method being called, even though the class is still in use.

I can consistently get it to happen with this file (download)¹. The finalizer that gets called early is in the NodeIdCollection class that is in DBAccessor.cpp file. If you are able to run the code that was linked to above (this project can be difficult to setup because of the dependencies on the boost library), the application would fail with an exception, because the _nodes list is set to null and the _db_ pointer was reset as a result of the finalizer running.

1) Are there any glaring problems with the enumeration code in the `NodeIdCollection` class that would cause the GC to finalize this class while it's still in use?

I've only been able to get the code to run properly with the workaround I've described below.

An unsightly workaround

Now, I was able to work around this problem by moving all of the resource management code from the each of the finalizers (!classname) to the destructors (~classname). This has solved the problem, though it hasn't solved my curiosity of why the classes are finalized early.

However, there is a problem with the approach, and I'll admit that it's more a problem with the design. Due to the heavy use of pointers in the code, nearly every class handles its own resources, and requires each class be disposed. This makes using the enumerations quite ugly (C#):

   foreach (var msg in pst.Messages)
   {
      // If this using statement were removed, we would have
      // memory leaks
      using (msg)  
      {
             // code here
      }
   }

The using statement acting on the item in the collection just screams wrong to me, however, with the approach it's very necessary to prevent any memory leaks. Without it, the dispose never gets called and the memory is never freed, even if the dispose method on the pst class is called.

I have every intention trying to change this design. The fundamental problem when this code was first being written, besides the fact that I knew little to nothing about C++/CLI, was that I couldn't put a native class inside of a managed one. I feel it might be possible to use scoped pointers that will free the memory automatically when the class is no longer in use, but I can't be sure if that's a valid way to go about this or if it would even work. So, my second question is:

2) What would be the best way to handle the unmanaged resources in the managed classes in a painless way?

To elaborate, could I replace a native pointer with the clr_scoped_ptr wrapper that was just recently added to the code (clr_scoped_ptr.h from this stackexchange question). Or would I need to wrap the native pointer in something like a scoped_ptr<T> or smart_ptr<T>?

Thank you for reading all of this, I know it was a lot. I hope I've been clear enough so that I might get some insight from people a little more experienced than I am. It's such a large question, I intend on adding a bounty when it allows me too. Hopefully, someone can help.

Thanks!

¹This file is part of the freely available enron dataset of PST files

I seriously doubt that the finalizer is being invoked by the .NET finalization thread *if the object is still in use*. Can you narrow the code down to a very simple example that shows that behavior? — Lasse V. Karlsen, Oct 15 '11 at 21:09
@LasseV.Karlsen - I can certainly try, though I'm not sure how simple it will be due to the wrapped code making heavy use of the boost library, I imagine I may have to include that as well to get this problem to reproduce itself. I will try my best though. — Christopher Currens, Oct 15 '11 at 22:28
@LasseV.Karlsen - I'm working on trying to reproduce it (I've been unsuccessful so far), but I wanted to address one thing. The code above will show that the finalization does happen when the object is still in use. I can place a breakpoint in the finalizer of the collection as I am enumerating over it. About half way through, with more to go, the breakpoint in the finalizer gets hit. The interesting part is that I can still access the object, but as the finalizer is run, the internal objects are deleted as per my code. I would expect an ObjectDisposedException? — Christopher Currens, Oct 15 '11 at 23:33
You appear to be in violation of my copyright, because you haven't followed my (very generous) license conditions. That could be cured by editing the copyright statement at http://pstsdknet.codeplex.com/SourceControl/changeset/view/a12e9f5ea9fe#trunk%2fpstsdk.mcpp%2fAssemblyInfo.cpp — Ben Voigt, Oct 16 '11 at 03:28
@BenVoigt - I will add that in. I made sure the copyright was left in the source, but I neglected to do it for the binary. It's in a new changeset. — Christopher Currens, Oct 16 '11 at 06:44
@ChristopherCurrens: Thanks, getting at least some recognition for the code is much appreciated, since it's the only form of compensation I'll get. — Ben Voigt, Oct 16 '11 at 14:42

score 2 · Accepted Answer · edited Apr 13 '17 at 12:40

2

The clr_scoped_ptr is mine, and comes from here.

If it has any errors, please let me know.

Even if my code isn't perfect, using a smart pointer is the correct way to deal with this issue, even in managed code.

You do not need to (and should not) reset a clr_scoped_ptr in your finalizer. Each clr_scoped_ptr will itself be finalized by the runtime.

When using smart pointers, you do not need to write your own destructor or finalizer. The compiler-generated destructor will automatically call destructors on all subobjects, and every subobject finalizer will run when it is collected.

Looking closer at your code, there is indeed an error in NodeIdCollection. GetEnumerator() must return a different enumerator object each time it is called, so that each enumeration would begin at the start of the sequence. You're reusing a single enumerator, meaning that position is shared between successive calls to GetEnumerator(). That's bad.

edited Apr 13 '17 at 12:40

Community

1
1

answered Oct 15 '11 at 22:57

Ben Voigt

277,958
43
419
720

@BenVoight - No idea about the downvote, but thank you! It was indeed an issue with the enumerator. The `NodeIdCollection` was sharing the enumerator, it was going out of scope and collected by the GC, and deleted the enumerator even though I was still using it. For some reason I had expected the `^enumerator` reference to prevent this from happening, but as you pointed out it wasn't proper to do it that way anyway. – Christopher Currens Oct 17 '11 at 15:07
@BenVoight - I've decided in lieu of the downvote, the time you've spent looking over the code to help, and the fact your clr_scoped_ptr works so well, I'd like to give you additional 100 rep, which I will do when it lets me in 24 hours. – Christopher Currens Oct 17 '11 at 19:20
@ChristopherCurrens: Don't sweat it. People who leave downvotes without giving a reason aren't worth worrying about. I just leave that comment hoping the voter will come and explain themselves. – Ben Voigt Oct 17 '11 at 20:46

score -1 · Answer 2 · edited Jun 20 '20 at 09:12

-1

Refreshing my memory of destructors/finalalisers, from some Microsoft documentation, you could at least simplify your code a little, I think.

Here's my version of your sequence:

DBContext::~DBContext()
{
    this->!DBContext();
}

DBContext::!DBContext()
{
    delete _pst;
    _pst = NULL;
}

The "GC::SupressFinalize" is automatically done by C++/CLI, so no need for that. Since the _pst variable is initialised in the constructor (and deleting a null variable causes no problems anyway), I can't see any reason to complicate the code by using smart pointers.

On a debugging note, I wonder if you can help make the problem more apparent by sprinkling in a few calls to "GC::Collect". That should force finalization on dangling objects for you.

Hope this helps a little,

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 15 '11 at 19:43

Adrian Conlon

3,941
1
21
17

I have no idea what a clr_scope_ptr is. But this code readily deletes the same object more than once when the client code calls Dispose() more than once. Not helpful. – Hans Passant Oct 15 '11 at 20:43
Good point, does the edit help? My original post was based on the uncertainty of the clr_scope_ptr and trying to remove it... – Adrian Conlon Oct 15 '11 at 20:58
Not really, it doesn't compile. Use nullptr. – Hans Passant Oct 15 '11 at 21:11
Hmm I'd assumed that _pst was a pointer to an *unmanaged* resource. Do you think it's a managed resource? – Adrian Conlon Oct 15 '11 at 21:24
No, I'm sure it is unmanaged. The code is compiled with /clr though. Which forbids NULL, channeling C++11 6 years too soon. Use 0 or nullptr. It's always best trying to compile the code you post yourself first :) – Hans Passant Oct 15 '11 at 21:42
@HansPassant: `#define NULL (0)`. So if you say `0` or `nullptr` works, then clearly `NULL` works as well. – Ben Voigt Oct 16 '11 at 00:41

Early finalization and memory leaks in C++/CLI library

The problem

1) Are there any glaring problems with the enumeration code in the NodeIdCollection class that would cause the GC to finalize this class while it's still in use?

An unsightly workaround

2) What would be the best way to handle the unmanaged resources in the managed classes in a painless way?

2 Answers2

1) Are there any glaring problems with the enumeration code in the `NodeIdCollection` class that would cause the GC to finalize this class while it's still in use?