8

I've been experimenting with programming language design, and have come to the point of needing to implement a garbage collection system. Now the first thing that came to mind was reference counting, but this won't handle reference loops. Most of the pages that I come across when searching for algorithms are references on tuning the garbage collectors in existing languages, such as Java. When I do find anything describing specific algorithms, I'm not getting enough detail for implementation. For example, most of the descriptions include "when your program runs low on memory...", which isn't likely to happen anytime soon on a 4 GB system with plenty of swap. So what I'm looking for is some tutorials with good implementation details such as how to tune when to kick off the garbage collector (i.e., collect after X number of memory allocations, or every Y minutes, etc).

To give a couple more details on what I'm trying to do, I'm starting off with writing a stack-based interpreter similar to Postscript, and my next attempt will be probably an S-expression language based on one of the Lisp dialects. I am implementing in straight C. My goal is both self education, and to document the various stages into a "how to design and write an interpreter" tutorial.

As for what I've done so far, I've written a simple interpreter which implements a C style imperative language, which gets parsed and processed by a stack machine style VM (see lang2e.sourceforge.net). But this language doesn't allocate new memory on entering any function, and doesn't have any pointer data types so there wasn't really a need at the time for any type of advanced memory management. For my next project I'm thinking of starting off with reference counting for non-pointer type objects (integers, strings, etc), and then keeping track of any pointer-type object (which can generate circular references) in a separate memory pool. Then, whenever the pool grows more than X allocation units more than it was at the end of the previous garbage collection cycle, kick off the collector again.

My requirements is that it not be too inefficient, yet easy to implement and document clearly (remember, I want to develop this into a paper or book for others to follow). The algorithm I've currently got at the front is tri-color marking, but it looks like a generational collector would be a bit better, but harder to document and understand. So I'm looking for some clear reference material (preferably available online) that includes enough implementation details to get me started.

vainolo
  • 6,907
  • 4
  • 24
  • 47
Derek Pressnall
  • 298
  • 1
  • 8
  • I should add that I have seen the descriptions of several garbage collectors, such as the variations on mark and sweep, but most of the pages I've run across weren't much better than the Wikipedia article. For example, as I mentioned in the question, they say to kick it of when memory gets low. Well that isn't likely to happen on modern systems during the runtime of most lightweight scripts, and even if it does, it wouldn't be good to use up all system memory before kicking off the collector. Details like that are what I'm looking for. – Derek Pressnall May 18 '12 at 01:38
  • http://doc.cat-v.org/inferno/concurrent_gc/ - should be more than enough of the details for implementing it. – SK-logic May 22 '12 at 09:23
  • @DerekPressnall If you're running a lightweight script, then it's likely to be best if GC doesn't run, because it will simply waste time. The memory will be freed when your process exits. – Marcin May 22 '12 at 10:06

2 Answers2

7

There's a great book about garbage collection. It's called Garbage Collection: Algorithms for Automatic Dynamic Memory Management, and it's excellent. I've read it, so I'm not recommending this just because you can find it with Google. Look at it here.

For simple prototyping, use mark-and-sweep or any simple non-generational, non-incremental compacting collector. Incremental collectors are good only if you need to provide for "real-time" response from your system. As long as your system is allowed to lag arbitrarily much at any particular point in time, you don't need an incremental one. Generational collectors reduce average garbage collection overhead with the expense of assuming something about the life cycles of objects.

I have implemented all (generational/non-generational, incremental/non-incremental) and debugging garbage collectors is quite hard. Because you want to focus on the design of your language, and maybe not so much on debugging a more complex garbage collector, you could stick to a simple one. I would go for mark-and-sweep most likely.

When you use garbage collection, you do not need reference counting. Throw it away.

Antti Huima
  • 25,136
  • 3
  • 52
  • 71
  • Regarding throwing out reference counting, there will be a number of objects that are highly transient, mostly ones that are temporarily on the stack -- for example "2 * 3 + 5" (or in RPN order, "2 3 * 5 +" will leave "6" on the stack until the the add operator consumes it. With just mark and sweep, it seems like the GC will be kicking off quite frequently. Yet is this still more efficient than the overhead of ref counting? Or is there another optimization I should look at for these temporary objects? Thanks. – Derek Pressnall May 18 '12 at 14:50
  • Garbage collection is faster than reference counting if you can do GC seldom enough. As a matter of fact, garbage collection is even faster than explicit memory management (e.g. new/delete or malloc/free) if you have enough extra heap space. Reference counting also consumes extra memory because you need to store the reference count field itself. Note that normally however you wouldn't allocate integers at all, but would represent them in-place, in place of pointers to GC'd objects. – Antti Huima May 18 '12 at 14:58
  • Ok, so I'll settle on using either mark & sweep, or stop & copy, and maybe expand it to a simple generational collector. I can see how a two-generation collector could help with the transient objects, especially with stop & copy. Now, since I plan on documenting every stage of my progress to form a tutorial, do you think I should start with reference counting, and demonstrate the deficiencies of it, then evolve it to something better in the writeup? I'm assuming that I can have the same generic interface exposed to the rest of the interpreter, and plug in different GCs as needed. – Derek Pressnall May 19 '12 at 01:37
  • You can do that, but I wouldn't. Cyclic data structures are prevalent in all programming paradigms (both functional and non-functional) and reference counting in my opinion should be restricted to acyclic data structures only. – Antti Huima May 19 '12 at 15:24
  • @DerekPressnall, if you're expecting a GC load typical for the functional languages, you'd be better off with a combination of a stop and copy (it will wipe out all the short living temporary stuff) and mark and sweep for the rest of the heap. – SK-logic May 22 '12 at 09:10
  • @SK-logic I think Derek mentions in his original post that he has been working so far with "imperative" languages, so the GC load might not be typical for a functional language but have more emphasis on direct mutations. – Antti Huima May 22 '12 at 20:11
  • @antti.huima, I'm referring to his comment ("*there will be a number of objects that are highly transient, mostly ones that are temporarily on the stack*") - it is pretty much the same kind of load as for the functional languages, and therefore, the same approach might be used. – SK-logic May 22 '12 at 21:58
  • @SK-logic I understand, but what matters if they are actually garbage collected objects or atomic values. If most of the transient values are [relatively small] integers, they should not be handled by GC at all because they are simple values. It is the concept of "unboxed integers". – Antti Huima May 23 '12 at 07:29
  • @antti.huima, what if he's implementing an RxRS-alike numeric tower, for example? In such a case, boxing is unavoidable. – SK-logic May 23 '12 at 08:43
  • @SK-logic sure but only for large numbers. On a 32-bit architecture you can easily store 30-bit numbers in place, just reserve two topmost bits (say) to distinguish heap pointers from numbers and e.g. for GC mark. – Antti Huima May 23 '12 at 11:03
  • @antti.huima, yep, but all the numbers (including primitive types) have to be boxed if your arithmetic operations are polymorphic and have to be dispatched over the argument types. That's one of the biggest disadvantages of the standard Scheme, for example. – SK-logic May 23 '12 at 11:20
  • @SK-logic Maybe we use the terminology differently. Small integers do not need to be heap allocated but can be represented directly in the same space a heap pointer would take. They still need to have a type tag, though. At least in one of the four Scheme implementations I've created in the history I represented small integers by 32-bit words whose LSB bits were 11 and heap pointers by words whose LSB bits were zero. The actual integers were obtained by shifting left by 2, and heap pointers could be just casted into pointers as they were 4-aligned anyway. So there was type tag but no heap objs – Antti Huima May 23 '12 at 14:31
1

When to kick off the allocator is probably wide open -- you could GC when a memory allocation would otherwise fail, or you could GC every time a reference is dropped, or anywhere in the middle.

Waiting until you've got no choice may mean you never GC, if the running code is fairly well contained. Or, it may introduce a gigantic pause into your environment and demolish your response time or animations or sound playback completely.

Running the full GC on every free() could amortize the cost across more operations, though the entire system may run slower as a result. You could be more predictable, but slower overall.

If you'd like to test the thing by artificially limiting memory, you can simply run with very restrictive resource limits in place. Run ulimit -v 1024 and every process spawned by that shell will only ever have one megabyte of memory to work with.

sarnold
  • 102,305
  • 22
  • 181
  • 238