2

I have a short-lived (fast to complete) perl script, that nevertheless uses enough memory to trigger the garbage collector. In turn, collection takes more than the rest of the processing.

Is there a way to disable garbage collection and let the OS to do it when the script exits?

Edit The GC pauses the script in the middle of it, not at the very end. KILLing it does not help.

Sam
  • 19,708
  • 4
  • 59
  • 82
  • Perl uses reference counting, not garbage collection (Though IIRC there might be an actual mark and sweep GC done at program exit to catch circular references, in which case... don't have any?)... – Shawn Mar 04 '20 at 13:28
  • 4
    Can we see your code? There might be a way to keep objects alive to prevent their reference count to fall down to 0. Also, how do you know that the GC is triggered exactly? – Dada Mar 04 '20 at 13:48
  • 1
    @Shawn where’s the sense in performing a garbage collection at program exit? – Holger Mar 04 '20 at 13:56
  • 1
    @Holger Think closing open files (And flushing buffers to disk), and other cleanup stuff. – Shawn Mar 04 '20 at 13:58
  • 1
    @Shawn A garbage collection is an operation that tells apart unused or unreachable objects from those still in use resp. reachable, whereas for closing *all* files or whatever resources at program exit, their reachability status is entirely irrelevant. – Holger Mar 04 '20 at 14:02
  • @Holger Perl has destructors that get called when an object is destroyed that need to be invoked on normal exit. (Though I don't know if lexical filehandle cleanup is implemented the same way as object destructors, it's the same concept.) – Shawn Mar 04 '20 at 14:23
  • 1
    @Shawn sounds like a concept that doesn’t really work with cyclic references. But for filehandles, I wouldn’t expect cyclic references at all, so closing them when leaving the scope should already be sufficient and not require an additional mark & sweep. But even when dangling references are possible, those objects would be recognizable by their nonzero counter, so you’d only need a “sweep without mark”… – Holger Mar 04 '20 at 14:29
  • This phase is called [global destruction](https://perldoc.perl.org/perlglossary.html#global-destruction). – Shawn Mar 04 '20 at 14:29
  • @Holger If you don't like the terminology, take it up with the perl devs, not me. – Shawn Mar 04 '20 at 14:35
  • 1
    @Shawn the terminology, i.e. “global destruction”, is fine. The question is whether it truly runs a “mark & sweep” or just a linear “sweep & destroy” pass. That would be interesting in the context of the OP’s question, as there would be a significant performance difference between them. – Holger Mar 04 '20 at 14:38
  • @Holger https://stackoverflow.com/questions/2972021/garbage-collection-in-perl – Shawn Mar 04 '20 at 14:42
  • 1
    @Shawn So it does “an ‘expensive mark and sweep’ to reclaim circular references” at the end, followed by destroying the still-reachable objects, like globals, anyway? Well, I’m still not convinced by this strategy, but if that’s how Perl works, then that’s the way it is. But since the OP already stated that the problem is not a GC at the exit, but in the middle of the program, I think we can (and should) stop this discussion at this point. – Holger Mar 04 '20 at 15:23
  • 1
    Re "*that nevertheless uses enough memory to trigger the garbage collector.*", That's not how Perl works. Variables are freed as soon as they are no longer referenced, no matter how much memory is being used. – ikegami Mar 04 '20 at 16:12
  • 1
    @Shawn, Re "*Perl uses reference counting, not garbage collection*", Perl most definitely uses garbage collection --you don't have to manually deallocate memory-- and reference counting the mechanism it uses to perform its garbage collection. – ikegami Mar 04 '20 at 16:12
  • 1
    @Holger, Re "*where’s the sense in performing a garbage collection at program exit?*", Because of destructors. The destructors of objects that survive to the end of the program should be called. Furthermore, they should be called in an orderly fashion: Object A should be destroyed after Object B if Object A references Object B. – ikegami Mar 04 '20 at 16:15
  • 1
    @ikegami I already made clear in the other comments, that my question was specifically, why does it perform a *mark/sweep* kind of garbage collection, whose purpose is to tell reachable and unreachable objects apart, when Perl calls the destructors of *all* objects, whether reachable or not, anyway. When there are circular references, there is no canonical order for destruction anyway. – Holger Mar 04 '20 at 16:18
  • 1
    @ikegami I suppose, we all now have understood that Perl’s intention is to destroy *all* objects which haven’t yet. The only missing bit, is the insight that an expensive mark/sweep is entirely unnecessary to achieve that. You can’t bring a meaningful order into circular loops or to unreachable independent sub graphs (and a mark/sweep isn’t even trying), while on the other hand, for the still reachable objects, the already existing reference counting mechanism would be sufficient, you only have to behave as if all globals are now going out-of-scope. – Holger Mar 04 '20 at 16:32
  • @Holger, Misread. Deleted comment. Composing new comment – ikegami Mar 04 '20 at 16:32
  • @Holger Re "*why does it perform a mark/sweep kind of garbage collection*", First of all, it doesn't have to be distinguish between reachable and unreachable objects. When it's finished freeing the symbol table, it's only left with unreachable objects (by definition, since reference counting is used exclusively). It's my understanding that the actual mechanism used is used to provide the best timeliness (trying to free containers before contained objects) – ikegami Mar 04 '20 at 16:36
  • 1
    @Holger, On second thought, that can't be right. It has to perform global garbage collection before freeing the symbol table so that subs/methods and package vars such as `@ISA`, `STDOUT` and `STDERR` remain available to the destructors. That's where the mark and sweep comes into play. And like I alluded above, the actual process is quite complicated in order to provide the best timeliness. – ikegami Mar 04 '20 at 16:40
  • 1
    @ikegami Yes, that concern also overlaps with the actual rule that the mark/sweep runs at *every thread exit* rather than application exit, which makes it comprehensible, even when a single threaded run is very common. – Holger Mar 04 '20 at 16:56

2 Answers2

6

Two approaches to exiting your program quickly without executing global destruction:

  1. POSIX::_exit

This is identical to the C function _exit(). It exits the program immediately which means among other things buffered I/O is not flushed.

  1. kill your main process with SIGKILL.

    kill 'KILL', $$;
    
mob
  • 117,087
  • 18
  • 149
  • 283
5

As mentioned in comments, garbage collection in Perl is a refcounting mechanism, and is triggered by the value no longer being referenced by anything (whether a variable it is stored in which may go out of scope or be assigned a different value, an operation it's part of, a subroutine call stack it's being passed around in, or an actual reference).

So to prevent a value from being cleaned up until program exit, the easiest way is to do the opposite of the conventional memory-conscious wisdom: reference the value from the global stash.

our $foo = \$something_to_keep_alive;

Alternatively, you can (ab)use the fact that circular references will prevent refcounts from decrementing until global destruction.

$something->{self} = $something;

This will cause the value to reference itself, even if done through another layer, until one of the references in the cycle is weakened, removed, or global destruction is reached. And again, certainly something to be avoided in normal circumstances, as it is a by-design memory leak.

Grinnz
  • 9,093
  • 11
  • 18