4

Suppose I have this in an long run ml file:

let l = [1;2;3]

let l = [1;2;3;4]

let _ = ...

Will the first l = [1;2;3] be GCed sometime?


What if the code is like this:

let l = [1;2;3]

let l = [1;2;3;4]

let l = [1;2;3]

let _ = ...

There are three l. 1st is shadowed by 2nd, and then 2nd is shadowed by 3rd.

Are the following situations possible, since GC's schedule is not determined?

  1. when reaching 3rd l, the GC has not collected the 1st [1;2;3], so the same memory was reused or re-referenced

  2. Immediately after 2nd l, GC collected 1st [1;2;3], then the 3rd l create new memory for [1;2;3]

Benoît Guédas
  • 801
  • 7
  • 25
Jackson Tale
  • 25,428
  • 34
  • 149
  • 271

2 Answers2

4

Not in the OCaml toplevel, defining a new value l does not free the previous l, which (as far as I remember the implementation) lives forever. It doesn't matter because it is a constant and only takes space proportional to the source code that engendered it, like binary code does.

$ rlwrap ocaml
        OCaml version 4.00.1

# let l = [ 1 ] ;;
val l : int list = [1]
# let w = Weak.create 1 ;;
val w : '_a Weak.t = <abstr>
# Weak.set w 0 (Some l) ;;
- : unit = ()
# Gc.full_major () ;;
- : unit = ()
# Weak.check w 0 ;;
- : bool = true
# 

This true means that l still lives in memory.

# let l = [ 2 ] ;;
val l : int list = [2]
# Weak.check w 0 ;;
- : bool = true
# Gc.full_major () ;;
- : unit = ()
# Weak.check w 0 ;;
- : bool = true
# 

And it still does not, although it is not “reachable” for a fine definition of reachable (not the definition the GC uses).

Neither compiler frees the original l either:

$ cat t.ml
let l = [ 1 ] ;;
let w = Weak.create 1 ;;
Weak.set w 0 (Some l) ;;
Gc.full_major () ;;
Printf.printf "%B\n" (Weak.check w 0) ;;
let l = [ 2 ] ;;
Printf.printf "%B\n" (Weak.check w 0) ;;
Gc.full_major () ;;
Printf.printf "%B\n" (Weak.check w 0) ;;
$ ocamlc t.ml
$ ./a.out 
true
true
true
$ ocamlopt t.ml
$ ./a.out 
true
true
true

Another example of the GC's definition of “reachability” being more approximative than the definition one might like is:

let g () = Gc.full_major ()

let f () = let l = [ 1 ] in (* do something with l; *) g(); 1

At the point when g is executed (called from f), the value l is no longer reachable (for a fine definition of reachable) and could be garbage-collected. It won't be because it is still referenced from the stack. The GC, with its coarse notion of reachable, will only be able to free it after f has terminated.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • If the context is in a ml file, compiled and run, then is what @blueberryfields answered correct? – Jackson Tale Mar 25 '14 at 21:42
  • @PascalCuoq that looks like an implementation/version specific quirk – blueberryfields Mar 25 '14 at 21:45
  • 1
    @blueberryfields It has been this way since caml-light, before there were weak pointers to observe that that was what happened. Which “implementation/version” do you claim does not have the “quirk”? – Pascal Cuoq Mar 25 '14 at 21:47
  • In your test, is that possible that GC hasn't got chance to free it yet? I wrote in my question, the context is `in an long run ml file` – Jackson Tale Mar 25 '14 at 21:47
  • @JacksonTale Please see the definition of `Gc.full_major`. http://caml.inria.fr/pub/docs/manual-ocaml/libref/Gc.html – Pascal Cuoq Mar 25 '14 at 21:47
  • ah, I didn't know you can force GC to free immediately. – Jackson Tale Mar 25 '14 at 21:49
  • @PascalCuoq if this behavior exists in the latest release of the core ocaml implementation, i would open a bug against it. – blueberryfields Mar 25 '14 at 21:52
  • @blueberryfields You might get a better explanation by asking in a mailing list why things are this way. Your bug will only be closed as “Works as designed / No change required”. – Pascal Cuoq Mar 25 '14 at 21:58
  • oh wow. interesting. this is only at the top (global) level, right? – blueberryfields Mar 25 '14 at 22:30
  • 1
    @blueberryfields Yes. A function's local variables are allocated to registers and the register allocation mechanism in the native compiler re-uses registers, thus making it visible to the GC that some values are no longer reachable. I temporarily had an example in my answer that showed one couldn't be too impatient for local variables, but I removed it since it only applied to bytecode compilation. – Pascal Cuoq Mar 25 '14 at 22:53
3

that depends on whether you still have references for it elsewhere - if l is the only reference, then yes, that reference is released on the second assignment, and will get GCed when appropriate.


lists are immutable in ocaml; list literals return new list instances.

from a programming stand-point, you can treat your "third" list as an entirely new one (even though it contains the same elements as the first).

from an under-the-cover implementation standpoint, it may be that the system is smart enough to know to reuse memory from the first list on the third assignment; i can't immediately think of an efficient way that might work, though, with the code as it's currently written.

so, at the third assignment:

  • the first list might still be around
  • it's still going to be, from a programming point of view, presented to you as an entirely new list
  • it's possible, but very unlikely, that the system, under the covers, knows/is able to optimize memory use well enough to reuse the memory from the first assignment. most likely, unless referenced elsewhere, that first list is garbage collected/will be garbage collected.

NOTE

Based on @PascalCuoq's answer below, and some experimentation:

The behavior is quite different if you're defining the variables as top-level variables. The OCaml garbage collector treats top level declarations as (permanent?) garbage collection roots - so they won't be GCed even if there are no longer reachable from running code.

So, if the above example is executed at the top level, there will be three different "l"'s in memory, none of which will be garbage collected.

blueberryfields
  • 45,910
  • 28
  • 89
  • 168