0

In the countless arguments about the superiority of C++-style deterministic destruction (RAII) versus garbage-collection, proponents of the former often suggest that it can do everything garbage-collection can do. There is, however, a pattern which is used frequently in Java and .NET languages, but for which I know no nice pattern in C++. Consider the class [could be Java or C#; using public fields for brevity}

public class MaxItemIdentifier
{
  public String maxItemName = null;
  public long maxItemValue = -0x7FFFFFFFFFFFFFFF-1;
  public void checkItem(String name, long value)
  {
    if (value > maxItemValue)
    {
      maxItemName = name;
      maxItemValue = value;
    }
  }
}

In both Java and C#, the above method may safely be passed a string that was created on any thread. Further, while the method is not thread-safe, even improper threading usage will not jeopardize memory safety; maxItemName would still be guaranteed to either be null or identify one of the strings passed to checkItem. Additionally, the method never actually has to copy (or even look at) the contents of any string; all it acts upon are string references. Since no string object whose reference has been exposed to the outside world will ever be modified, a reference to a string may be considered synonymous with the sequence of characters identified thereby, and copying the reference is equivalent to copying the text.

Would there any way to write an equivalent class in C++ or similar RAII-based language which would guarantee memory safety regardless of threading usage, but would be not be needless inefficient when run from a single thread? The only approaches I'm aware of that such a method could be workable in C++ would be if either:

  1. Whenever encountering an item whose "value" is larger than the previous maximum, the method copies the contents of the string; this would be slower than simply copying a reference. Further, I don't know how well this could maintain memory safety in the presence of improper threading usage.

  2. Have the method receive a reference to a reference-counted pointer, and hold a variable of that type; when receiving a value which is larger than the previous maximum, atomically increment the reference count on the received pointer and atomically decrement the reference count on the previous maximum-item name; if the latter yields zero, release that name. This approach would seem safe, but on many platforms atomically increments and decrements would be excessively expensive for single-threaded usage, but would be necessary for memory safety in multi-threading scenarios.

Personally, I believe a good language/framework should support both RAII and GC, since each can handle very easily and efficiently some things which the other really can't handle at all. It's possible, though, that there would be some other approaches to handling such things in RAII that I'm unfamiliar with. Is there any way when using RAII to make a method like the above work efficiently when used in single-threading scenarios but also be usable in scenarios where a reference to a string created on one thread might then be exposed to other threads?

Note that unlike some other multi-threading RAII scenarios in which objects have a predictable lifetime, being consistently created in a producer thread and destroyed in a consumer thread (the subject of a related post), references to immutable objects like Strings are often shared without any reference-holder being identifiable as an "owner", and without any way of knowing if or when whether any particular reference-holder might overwrite the last surviving reference to a string.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 1
    RAII is just an idiom/pattern for aiding resource management. Garbage collection is a strategy for memory management. They don't contrast against another and are not mutually exclusive. – Rufflewind Feb 15 '15 at 00:03
  • Are you asking "how does one deal with memory allocations shared across multiple threads in a language without garbage collection?" – Rufflewind Feb 15 '15 at 00:06
  • 2
    I think it's fair to say that neither RAII nor Garbage Collection is the "reason" that the above code may or may not work in some particular environment. To have "references for everything" is an overhead. C++ following the principle of "you only pay for what you use" doesn't enforce that, but you could easily design something where strings are ALWAYS references. – Mats Petersson Feb 15 '15 at 00:07
  • 2
    As soon as you have a data race (or "improper threading usage"), you are in undefined behavior land. Discussing whether memory leaks or not in a program with no defined behavior seems rather...pointless. – T.C. Feb 15 '15 at 00:10
  • @T.C.: Immutable objects can't have data races, because while actions are *potentially concurrent*, they cannot be *conflicting*. (Of course, mutating the references to said immutable objects can race) – Ben Voigt Feb 15 '15 at 00:33
  • @BenVoigt: The difficulty in C++ as I see it is that, unless I'm missing something, ownerless objects, even if they're immutable, need to use mutable state to track when the last reference is destroyed. – supercat Feb 15 '15 at 01:14
  • 1
    Same problem exists in garbage collection. Often addressed via "stop the world" technique, but concurrent collectors do exist. – Ben Voigt Feb 15 '15 at 01:16
  • @Rufflewind: Not so much the allocations as the cleanup, in cases where it's impossible to predict which thread will end up destroying the last reference to an object. – supercat Feb 15 '15 at 01:17
  • @T.C.: If something like an interpreter for a domain-specific language is given a program which doesn't obey the rules of the language, there's a big difference between saying the program might yield arbitrary results, versus saying that it might have arbitrary effects upon the target system. Although in this particular example it wouldn't make sense for multiple threads to use a `MaxItemIdentifier` simultaneously, there are a number of cases in .NET where it would be perfectly reasonable for one thread to read a `String` field set by another, with no synchronization, if... – supercat Feb 15 '15 at 01:31
  • ...in cases where the reading thread might see the old or new value, either would be fine. For example, `FileCopier` class might have a `String` field which reports its current status for purposes of e.g. a progress display. If the UI thread reads the variable while it's being written, it doesn't really matter whether it shows the old or new string, provided that it shows a valid one. – supercat Feb 15 '15 at 01:33

1 Answers1

2

This is certainly a known concern.

You can't get deterministic destruction of shared objects without some overhead. The reason that destruction comes out on top of finalization in so many cases is that:

  1. Most objects are not shared.
  2. Many shared objects require deterministic destruction, and the costs of providing that on top of finalization are even higher than the costs of using RAII to implement reference counting.

And clearly RAII does a great job of managing destruction.

It's really only objects that are shared among multiple unsynchronized users and will be abandoned and never used again where finalization outshines destruction. An example would be zero-copy multicast sockets (or at least O(1) copy).

The tradeoff is still between determinism (with some counting overhead) and non-determinism. Because C and C++ don't enforce a single resource management method, it's actually easier to mix destructor-based deterministic cleanup with efficient non-deterministic cleanup than it would be on top of the .NET or Java runtimes where everything undergoes non-deterministic deallocation.

An example of non-deterministic cleanup in the native world is the RCU method used in the Linux kernel. It's C code, but applies equally well to C++.

So even here, the advantage goes to RAII, you just use a different set of smart pointers for non-deterministic RCU than the ones for local scope deterministic release or reference counting, thread synchronization, and deterministic release.

Really, that's where your thinking went wrong. RAII is able to provide deterministic lifetime, but it is not limited to deterministic lifetime.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • I agree with you that RAII is as good or better than GC for almost anything other than immutable-object references which are passed around as proxies for the contents of those objects; I consider it unfortunate that .NET and Java didn't invest the time they spent on finalizers on providing RAII support instead. On the other hand, just as a GC doesn't eliminate the need for deterministic cleanup, I don't see C++ deterministic cleanup support as eliminating the benefits GC can offer, especially with immutable types like `String`. – supercat Feb 15 '15 at 01:59
  • @supercat: I think you missed my point. You can write a smart pointer with non-deterministic cleanup (perhaps using one of the modes of garbage collection) following the RAII pattern. RAII is good at managing deterministic cleanup. It is also suitable for managing non-deterministic cleanup. – Ben Voigt Feb 15 '15 at 02:07
  • How efficient could a smart pointer be if used in a method as shown above? Could the same method efficiently handle the scenarios where the pointer targets were all owned by the same thread, and where they might be owned by arbitrary threads? – supercat Feb 15 '15 at 02:35
  • @supercat: One of the major problems with garbage collection in an "unmanaged" environment is that the gc doesn't know whether any particular byte pattern matching an object address is actually a pointer to that data (see "conservative" vs "precise" gc). Also, you probably have one memory area for garbage collection, and another for explicit deallocation. Smart pointers can help with those issues. Or, you can use a smart pointer in conjunction with the RCU pattern I linked in my answer. It's not necessary for the destructor to actually DO anything at all. – Ben Voigt Feb 15 '15 at 03:04
  • @supercat: Finally, another option is not freeing individual objects ever, just letting them pile up and then freeing a big memory block containing an unknown number of objects, all at once. – Ben Voigt Feb 15 '15 at 03:05
  • Adobe Postscript used the latter approach, as do some embedded systems. I guess my real question, which perhaps I should edit to phrase it better, is how one should write code that passes around potentially-large immutable objects in such a way as to be agnostic to threading issues--not wasting time on synchronization for objects used only on a single thread, but still maintaining correctness for objects used on multiple threads. Are there any patterns for efficiently using handles (with the mechanics that handles must be explicitly created and destroyed, but multiple handles may identify... – supercat Feb 15 '15 at 17:32
  • @supercat: It sounds like you've already decided to use garbage collection, because you've set a requirement of *no synchronization* rather than reduced cost of synchronization. Is your real question "What good are smart pointers with garbage collected data, when the smart pointer destructor can't free the data?" ? The advantages of smart pointers (RAII) are that they can enable precise gc and compacting. Conservative garbage collection cannot compact, ever, so you need a heap and miss out on cheap allocation, etc. – Ben Voigt Feb 15 '15 at 18:04
  • My impression--correct me if I'm wrong--is that the relative cost of memory synchronization operations is apt to increase as core counts increase. What I'd like to see would be an efficient means by which each thread could have its own pool of unused GC handles, so that creating a new handle would only require memory synchronization if a thread didn't have any in its pool (at which multiple handles could be created, with one being used to satisfy an immediate need and the rest added to the thread's pool), and by which when the GC ran it could force other threads to flush their caches. – supercat Feb 17 '15 at 17:03
  • Given such abilities, I would think the cost of constructing and destructing GC handles could be kept minimized even in platforms where synchronization would be costly. I don't know how well such a thing can be accomplished in C++, though. – supercat Feb 17 '15 at 17:05