C++ How to debug a dangling pointer in production

Question

Basically the situation is we have a C++ program that occasionally crashes when it attempts to access an already-freed object (in Debug build we notice that the memory being pointed to is full of the usual "cdcdcdcd" pattern). We tried to trace every point where the object is cleared and couldn't find a place where known pointers aren't properly set to null.

There are two main issues - The code is extremely large and convoluted, written over a period of at least a decade and there are several developers whose whereabouts are unknown and even some known to be deceased, so it's not possible to get in touch with the people who originally wrote the code. The complexity of the code makes it impractical to manually determine how many pointers to the said object exist and which functions use or make copies of them.

The second big issue is that we don't have a reliable way to reproduce. We know that in the production system, which has hundreds of concurrent users, it crashes about twice a day, but all attempts to reproduce the crash in a test environment have failed. It should be possible for us to inspect the production environment for a few minutes after a crash but eventually we have to bring it back up. The environment is Windows Server 2019 and the program is compiled with Visual Studio 2019. There is a copy of Visual Studio and the program source code on the server. We have already attempted to use DMP files, which failed because the dump only shows the use of the dangling pointer, it does not tell us where was the point where the pointed object was freed.

I would appreciate any advice because I'm pretty much out of ideas.

Thanks.

I'd start by hunting for [Rule of Three](https://en.cppreference.com/w/cpp/language/rule_of_three) violators. — user4581301, Nov 15 '21 at 06:45
Not much we can do without a [mre]. Replacing raw pointer use with `shared_ptr` or `unique_ptr` might fix the issue. You could add a print statement to the destructor of the object causing the crash and print its address and maybe even a stack trace. Writing lots of unit tests and using a test coverage tool to make sure they test everything night help too — Alan Birtles, Nov 15 '21 at 07:29
The object and its type and even where it is accessed post mortem is known. Now it is usual tedious work to figure from where that pointer you used to dereference it was copied and other copy was deleted. If no one in your team can, then you have to hire. — Öö Tiib, Nov 15 '21 at 07:47
*I would appreciate any advice because I'm pretty much out of ideas.* -- Slowly but surely, rewrite the parts of the code that can be rewritten using modern C++ using containers, smart pointers, `std::string` instead of character pointers, etc. and observe if the new program behaves correctly. If the new program mimics the old program, and the new program crashes, you now have a much better chance of debugging the issue. — PaulMcKenzie, Nov 15 '21 at 08:10
MSVS now supports [Address Sanitizer](https://devblogs.microsoft.com/cppblog/address-sanitizer-for-msvc-now-generally-available/) . Start by building a release build with this on and testing it. Make sure to also generate the necessary .pdb files for any dump analysis. — Richard Critten, Nov 15 '21 at 08:44
Setting a pointer you delete to null is a red herring. What about all the copies of that pointer? — Caleth, Nov 15 '21 at 10:10
Replace every definition of a `T *` with a definition of a `std::unique_ptr`, and the compiler will tell you where the copies are — Caleth, Nov 15 '21 at 10:14
@Caleth: In fact, almost all pointers deleted should be in destructors, so the pointer ceases to exist. Setting it to null there is rather pointless. If anything, for debugging purposes you could try to `VirtualAlloc2` a unique pointer with `MEM_RESERVE_PLACEHOLDER`. Using it will still crash, but you can log the allocated pointer and then match the crash address with the allocation. — MSalters, Nov 15 '21 at 10:17
@MSalters indeed, and the best destructors are implicitly generated ones. — Caleth, Nov 15 '21 at 10:20

score 1 · Answer 1 · answered Nov 15 '21 at 10:01

1

You're already somewhere by noticing a "use-after-free" problem. The next step is figuring out what part of that is wrong. Should the object not have been freed, or should it not have been used?

With smart pointers this is still relevant to know - should you use a shared_ptr or a weak_ptr? These classes are not magic - you can implement the same behavior in many other ways. But they're sure convenient; you just need to figure out which one to use. The other big advantage is that future readers will see a weak_ptr and can reverse your logic - that's a non-owning pointer, so that implements the "don't use after free" instead of "don't free while in use".

answered Nov 15 '21 at 10:01

MSalters

173,980
10
155
350

Well I agree that a shared_ptr would've been a good idea had it been used everywhere but unfortunately that's not the case (I think shared_ptr did not even exist in STL when that code was first written). It's legacy code and coding standards were hardly enforced in the past, so tracing and finding every single use is pretty much impossible with current HR. I was hoping there's some sort of debugging utility that can keep track of all pointers to a specific address. – Dan Nov 27 '21 at 16:34
@Dan You need a way to track down invalid pointers, so I found a [solution](https://stackoverflow.com/questions/19865574/how-to-track-a-invalid-pointer-in-c) for you. Hope it helps. – Yujian Yao - MSFT Nov 30 '21 at 06:58

C++ How to debug a dangling pointer in production

1 Answers1