Instead of creating smart pointers, Why could we not have modified C++ compilers to better catch pointer issues at compile time?

Question

If we could design smart pointers to know when to destroy/delete heap memory based on the scope. Why couldn't we have just engineered the compiler to have flagged when heap memory was going out of scope without being deleted?

Why was it more practical to create smart pointers?

I know that that is not the only reason for and benefit of smart pointers but why were these improvements not practically implementable via changes to the compiler?

Thank you

*Why couldn't we have just engineered the compiler to have flagged when heap memory was going out of scope without being deleted?* Compilers can't tell what the run time behavior of the code will be, which makes this a very hard problem for the compiler to deal with. — NathanOliver, Mar 15 '21 at 15:41
How to you express ownership transfer/copy with raw pointer? — Jarod42, Mar 15 '21 at 15:42
The C++ language allows for compilers to perform garbage collection, but as far as I know, none have implemented it. *"Why couldn't we have just engineered"* probably because you can't *just choose* to solve something like this, it is a very very hard problem to solve. I'm not sure, but I think it may be one of those provably unsolvable problems like the halting problem which mathematically cannot be solved. Edit : languages that implement garbage collection invariably force constraints on what you can do with pointers to simplify the problem. — François Andrieux, Mar 15 '21 at 15:42
@FrançoisAndrieux I think they did, and called it C++/CLI :-) — Ted Lyngmo, Mar 15 '21 at 15:43
@FrançoisAndrieux: Engineering-wise it's easier than the halting problem because deleting memory a bit too late is not a hard error. Most garbage collectors in other languages run periodically exactly because it's not a big deal if it takes a few more milliseconds. — MSalters, Mar 15 '21 at 15:45
@MSalters The question asks about catching pointer issues at compile-time. — nanofarad, Mar 15 '21 at 15:46
@NathanOliver. Ok, focusing on scope then. If the compiler can tell that things are in or out of scope why cannot it tell you that heap memory going out of scope has not been deleted? — Seth Van, Mar 15 '21 at 15:47
Lets say you exit the scope with an exception. How is the compiler supposed to deal with that? — NathanOliver, Mar 15 '21 at 15:48
@nanofarad To be fair, my comment introduces runtime cleanup to the discussion. So their reply is approrpiate. — François Andrieux, Mar 15 '21 at 15:48
@MSalters I probably should have made that two comments. The first sentence is an interesting fact, the rest addresses the question. And the edit refers to the first sentence. Sorry for the confusion. — François Andrieux, Mar 15 '21 at 15:50
How about pointers to heap memory that should survive the end of the current scope? Scope-local heap-allocations are rather rare I would say. — Some programmer dude, Mar 15 '21 at 15:50
@Someprogrammerdude So how do smart pointers detect when to destruct? Is that not imitatable through anything the compiler could detect and flag to the programmer to check? — Seth Van, Mar 15 '21 at 15:55
@SethVan `unique_ptr` is not copyable so when the smart pointer is destroyed and it points to an object, it knows it has to destroy it. `shared_ptr` uses a shared counter to know how many copies of the pointer still exist. Edit : `unique_ptr` is basically pointer-checking at compile time, but `shared_ptr` only works at run-time. You would need to predict the behavior of the program to make it work at compile time, but that requires you to know all the user inputs too. That's basically what running a program is, so it wouldn't be compile time anymore. — François Andrieux, Mar 15 '21 at 15:57
@SethVan I think you confuse object destruction with heap deallocation. The two doesn't have to be linked. Take shared pointer for example, it contains a shared (private) structure that contains information about the allocation, like a usage counter. When a new shared pointer object is created the counter is increased, and when a shared pointer object is destructed the counter is decreased. Once the counter reaches zero the memory managed by the shared pointer is deallocated. — Some programmer dude, Mar 15 '21 at 15:57
@Someprogrammerdude so is that the reason smart pointers were more practical? the feature of shared pointers could not be practically matched by anything a compiler could provide? — Seth Van, Mar 15 '21 at 16:01
@FrançoisAndrieux So is one thing I am not getting here is that smart pointers add a value at run time where a compiler cannot do anything beyond compile time? — Seth Van, Mar 15 '21 at 16:08
@SethVan Basically, yes. The compiler can't predict what the user input will be, so it can't know for sure whether a pointer will be leaked or not. It can't "track" the copies of a `shared_ptr`, this is something that is possible while the program is running. — François Andrieux, Mar 15 '21 at 16:10
@FrançoisAndrieux ok, Thank you and everybody else as well. This is very helpful. I was feeling shy about asking it but I really wanted to know :) — Seth Van, Mar 15 '21 at 16:12
@SethVan smart pointers are more practical because they are explicit. Developers aware of their benefits and limitations. One example - every experienced C++ developer is aware (or at least should be) that you cannot have cycles in smart pointer ownership, so you have to decide what ownership model you would use in such case. That is inpractical for compiler to decide, it simply does not have enough information. — Slava, Mar 15 '21 at 16:17
Doing this kind of static lifetime analysis is possible in both theory and practice, and there are other languages with type systems that handle it. (See for instance [Rust](https://www.rust-lang.org/).) Making such a radical change to C++ would be impossible, though. — molbdnilo, Mar 15 '21 at 16:20
@molbdnilo: Rust has lifetime in its type system and impose so some extra rules. But it also has smart pointers. — Jarod42, Mar 16 '21 at 11:56

nanofarad · Answer 1 · 2021-03-15T15:59:25.677

When your code is sufficiently complex, deciding whether a pointer goes out of scope can be reduced to some equivalent of the Halting Problem - an undecidable problem for a compiler. It's not that the problem is solvable but impractical, but rather that a computer program to decide whether a program halts literally cannot exist.

A trivial example of this reduction to the Halting Problem is the following pseudocode:

Allocate x;
Do arbitrary tasks using x as storage;
Print x;
Deallocate x;
Do other tasks;

x is leaked if and only if "Do arbitrary tasks using x as storage" halts.

If you throw in additional considerations such as multithreaded/concurrent execution, the problem gets even nastier.

As Nicol Bolas' answer brings up, there are also ways to hide pointers that cannot be easily instrumented by the compiler, e.g. by round-tripping a pointer through a uintptr_t, perhaps with some bijective function obfuscating it.

On the other hand, this is much easier to do at runtime. Garbage collection is a pretty mature technology, seen in runtimes like the Java Virtual Machine.

Furthermore, there is compiler assistance to detecting leaks and other memory issues in C++ -- clang++ and and g++ include a runtime sanitizer known as ASAN, which will warn on illegal accesses during runtime and leaks at shutdown, although it does not warn when an allocation is unreachable/no longer used but the program has not yet terminated.

I would advise writing "if and only if" instead of "iff", this is SO, not math exchange. — Hi - I love SO, Mar 15 '21 at 15:46
Just because a problem is undecidable, that doesn't mean it is even close to anything equivalent to the halting problem. — Hi - I love SO, Mar 15 '21 at 15:47
@Hi-IloveSO It's undecidable and I show it **specifically** by a reduction of a trivial program to HP. I'm sure there are other ways you can prove that it's undecidable. I'm still working on another edit so iff will be updated at the same time as that is finished. — nanofarad, Mar 15 '21 at 15:48

score 2 · Answer 2 · answered Mar 15 '21 at 15:54

I'm going to ignore the broader C++ issue that the language has holes in it that let you hide pointers inside of non-pointer-like things. Yes, many of these are UB, but there are APIs that basically require these schenanigans. Such things make automatic GC impossible from a practical perspective. Instead, we'll assume the compiler has a perfect way to instrument pointers to do this. So I'll focus on the more obvious issues:

Backwards compatibility and performance.

Let's assume you can do this while solving the C/C++ interop problem (ie: your C++ pointers still need to be the same size and store the same information as C pointers). Even so, most people don't write their code expecting garbage collection. You have decades of code out there written to destroy objects after their creation.

So what would a GC-based C++ do with such code? If it sees a pointer to an object outlive an explicit deallocation of the object, when should it be destroyed? When the user said to do it, or when the last pointer goes away? If you pick the former answer, then you haven't gained anything, since you just broke GC. And if you pick the latter, then you've broken your covenant with the user, since the user explicitly said "destroy this object and free this memory" and you didn't.

So a codebase has to be written expecting GC; you can't just give it to them behind the scenes.

Also, a common philosophy of C++ is "pay only for what you use". Garbage collection is not free. Even lifetime-scope-based GC isn't free, especially of the shared_ptr variety. But you're forcing this cost on everyone even if they didn't ask for it and don't need it.

Not having automatic memory management is a feature of C++, not a bug. It allows users to have the freedom to decide for themselves what the best form of memory management will be.

Instead of creating smart pointers, Why could we not have modified C++ compilers to better catch pointer issues at compile time?

2 Answers2