Simple answer: don't use C++. Sorry, joke.
But if you want to take this kind of absolute control over memory management in C++, across libraries/module boundaries, and in a completely generalized way, you can be in for some terrible grief. I'd suggest to most to look for reasons not to do it more than ways to do it.
I've gone through many iterations of this same basic idea over the years (actually decades), from trying to naively overload operator new/new[]/delete/delete[] at a global level to linker-based solutions to platform-specific solutions, and I'm actually at the desired point you are at now: I have a system that allows me to see the amount of memory allocated per plugin. But I didn't reach this point through the kind of generalized way that you desire (and me as well, originally).
C++ complicates things a bit because allocation and initialization are
usually fused.
I would offer a slight twist to this statement: C++ complicates things because initialization and allocation are usually fused. All I did was swap the order here, but the most complicating part is not that allocation wants to initialize, but because initialization often wants to allocate.
Take this basic example:
struct Foo
{
std::vector<Bar> stuff;
};
In this case, we can easily allocate Foo through a custom memory allocator:
void* mem = custom_malloc(sizeof(Foo));
Foo* foo = new(foo_mem) Foo;
...
foo->~Foo();
custom_free(foo);
... and of course we can wrap this all we like to conform to RAII, achieve exception-safety, etc.
Except now the problem cascades. That stuff
member using std::vector
will want to use std::allocator
, and now we have a second problem to solve. We could use a template instantiation of std::vector
using our own allocator, and if you need runtime information passed to the allocator, you can override Foo's constructors to pass that information along with the allocator to the vector constructor.
But what about Bar
? Its constructor may also want to allocate memory for a variety of disparate objects, and so the problem cascades and cascades and cascades.
Given the difficulty of this problem, and the alternative, generalized solutions I've tried and the grief associated when porting, I've settled on a completely de-generalized, somewhat pragmatic approach.
The solution I settled on is to effectively reinvent the entire C and C++ standard library. Disgusting, I know, but I had a bit more of an excuse to do it in my case. The product I'm working on is effectively an engine and software development kit, designed to allow people to write plugins for it using any compiler, C runtime, C++ standard library implementation, and build settings they desire. To allow things like vectors or sets or maps to be passed through these central APIs in an ABI-compatible way required rolling our own standard-compliant containers in addition to a lot of C standard functions.
The entire implementation of this devkit then revolves around these allocation functions:
EP_API void* ep_malloc(int lib_id, int size);
EP_API void ep_free(int lib_id, void* mem);
... and the entirety of the SDK revolves around these two, including memory pools and "sub-allocators".
For third party libraries outside of our control, we're just SOL. Some of those libraries have equally ambitious things they want to do with their memory management, and to try to override that would just lead to all kinds of clashes and open up all kinds of cans of worms. There are also very low-level drivers when using things like OGL that want to allocate a lot of system memory, and we can't do anything about it.
Yet I've found this solution to work well enough to answer the basic question: "who/what is hogging up all this memory?" very quickly: a question which is often much more difficult to answer than a similar one related to clock cycles (for which we can just fire up any profiler). It only applies for code under our control, using this SDK, but we can get a very thorough memory breakdown using this system on a per-module basis. We can also set superficial caps on memory use to make sure that out of memory errors are actually being handled correctly without actually trying to exhaust all contiguous pages available in the system.
So in my case this problem was solved via policy: by building a uniform coding standard and a central library conforming to it that's used throughout the codebase (and by third parties writing plugins for our system). It's probably not the answer you are looking for, but this ended up being the most practical solution we've found yet.