23

The introduction

The following code shows that when using runhaskell Haskell Garbage Collector releases the memory, when a is no longer used. It results in core dump while releasing variable a - for a purpose, to inspect the behaviour - a has got nullFunPtr as a finalizer.

module Main where

import Foreign.Ptr 
import Foreign.ForeignPtr


main :: IO ()
main = do
    a <- newForeignPtr nullFunPtr nullPtr
    putStrLn "Hello World"

The problem

When running the same in ghci it does not release memory. How can I force ghci to release no longer used variables?

$ ghci
> import Foreign.Ptr
> import Foreign.ForeignPtr
> import System.Mem
> a <- newForeignPtr nullFunPtr nullPtr
> a <- return () -- rebinding variable a to show gc that I'm no longer using it
> performGC
> -- did not crash - GC didn't release memory
> ^D
Leaving GHCi.
[1]    4396 segmentation fault (core dumped)  ghci

Memory was released on exit, but this is too late for me. I'm extending GHCi and using it for other purpose and I need to release the memory earlier - on demand or as fast as possible would be really great.

I know that I can call finalizeForeignPtr, but I'm using foreignPtr just for debug purposes. How can I release a in general in last example?

If there is no possibility to do it with ghci prompt, I can also modify ghci code. Maybe I can release this a by modyfing ghci Interactive Context or DynFlags? So far I've got no luck with my reaserch.

remdezx
  • 2,939
  • 28
  • 49
  • 1
    Are you sure the memory is not released? I don't think there's a guarantee that finalizers run promptly when a variable is GC'd. – Daniel Wagner Nov 03 '14 at 11:01
  • 1
    Rather yes, I did similar tests with large arrays and monitoring it with `ekg`. Nothing was released. – remdezx Nov 03 '14 at 11:16
  • Why should be `a` garbage collected after reassignment to `()` ? How could ghci know (from inside a kind of IO monad) it won't be needed ? – David Unric Nov 03 '14 at 12:34
  • 2
    If end of scope can be determined, it would work as you may expect, without pointless 'rebinding'. `GHCi> let testNull = do { a <- newForeignPtr nullFunPtr nullPtr; return () }` `GHCi> performGC` results in an immediate SIGSEGV. – David Unric Nov 03 '14 at 12:45
  • 3
    I thought that if variable is rebound, the old value will be released. Using scopes is nice idea, but unfortunately in my code I have many variables like this `a` and I cannot release only few of them using scoping like that... – remdezx Nov 03 '14 at 13:10
  • @DavidUnric: How could ghci know it won't be needed? Simply - if no other variable "points" to that datatype it will not be accessible by anything and could be released. Where is the problem with this way of thinking? – Wojciech Danilo Nov 03 '14 at 13:43
  • do you try to reload hs file? – viorior Nov 03 '14 at 14:31
  • I'm not loading any file here. `:r` has no effect. – remdezx Nov 03 '14 at 14:40
  • @remdezx I suppose another wrinkle is that memory being "released" is only released back to GHC, not back to the OS. Is it possible that this thing is getting garbage collected, the memory is being released to GHC, but your reporting tool is reporting how much memory has been reserved from the OS? – Daniel Wagner Nov 03 '14 at 18:34
  • No, I'm using `ekg` which gets its metrics from RTS statistics – remdezx Nov 03 '14 at 18:37
  • @DanielWagner, if you find any way how to declare a variable in ghci session, that will be released on demand, it will be very appreciated. – remdezx Nov 03 '14 at 18:41
  • You have to remember that `a <- expression` does _not_ assign a value to a variable, it introduces a _new local variable_ which has that value. If you do `a <- return 2` and then `a <- return ()`, it's not a type error. We haven't reinvented untyped programming, we've just added a more local scope where the old `a` is shadowed by the new `a`. The old `a` is still in scope, just shadowed by a more local scope. It's bad practice for naming variables, not overwriting `a`; pure functional programming doesn't do this, by design. For the garbage collector to collect it it needs to go out of scope. – AndrewC Nov 03 '14 at 20:16
  • @remdezx I am trying to claim that what you have already done is enough for that, and that you are misinterpreting your monitoring tools (or asking the wrong question -- e.g. maybe your question is "how do I get GHC to release memory to the OS" or "how do I get a finalizer to run quickly" or something similar). – Daniel Wagner Nov 04 '14 at 02:58
  • @AndrewC, I know how `do` block works, as I mentioned before I'm trying to make variable `a` lose its scope and somehow force GC to release it. – remdezx Nov 04 '14 at 08:08
  • I would say this is a bug, you should file a ticket at http://hackage.haskell.org/trac/ghc/, including your very good minimal example. – Joachim Breitner Nov 04 '14 at 11:42
  • @remdezx It's in scope to the end of the do block. Haskell is lexically, not dynamically scoped, so you _can't_ take it out of scope until the end of the do block. You have two problems. 1. If you start a new do block in ghci, nothing will happen until you close the do block, which limits interactivity. 2. If you use the implicit do block ghci provides, you can't go out of scope until you quit ghci. The only thing I can think of is using indirect references by using >>= to put the output of your handle generator into an MVar, which you then edit later to make the handle unreferenced. Dunno. – AndrewC Nov 04 '14 at 16:59
  • @AndrewC: None of this would prevent an implementation to let the scope stop earlier if it knows that the value cannot be referenced from the GHCi command line any more – and I believe it can safely say so. The GC doesn’t know anything about scopes, only about references! – Joachim Breitner Nov 05 '14 at 09:43

1 Answers1

9

Tracing through the code we find that the value is stored in the field closure_env of the data type PersistentLinkerState, which is a ClosureEnv, i.e. a mapping from name to HValues. The relevant function in Linker.hs is

extendLinkEnv :: [(Name,HValue)] -> IO ()
-- Automatically discards shadowed bindings
extendLinkEnv new_bindings =
  modifyPLS_ $ \pls ->
    let new_closure_env = extendClosureEnv (closure_env pls) new_bindings
    in return pls{ closure_env = new_closure_env }

and although the comment indicates that it should remove the shadowed binding, it does not, at least not the way you want it to.

The reason is, as AndrewC writes correctly: Although both variables have the same source code name, they are different to the compiler (they have a different Unique attached). We can observe this after adding some tracing to the function above:

*GHCiGC> a <- newForeignPtr nullFunPtr nullPtr
extendLinkEnv [a_azp]
*GHCiGC> a <- return ()
extendLinkEnv [a_aF0]
*GHCiGC> performGC
extendLinkEnv [it_aFL]

Removing bindings with the same source-name at this point should solve your GC problem, but I don’t know the compiler well enough to tell what else would break. I suggest you open a ticket, hopefully someone will know.

Confusion on binding vs. value

In the comments there seems to be some confusion about bindings and values. Consider this code:

> a <- return something
> b <- return somethingelse
> a <- return (b+b)
> b <- return anewthing

With the current implementation, the heap will consist of `

  • something
  • somethingelse
  • a thunk referencing the (+) operator and somethingelse
  • anewthing.

Furthermore the environment of the interpreter has references to all four heap values, so nothing can be GC’ed.

What remdezx rightly expected is that GHCi would drop the reference to something and somethingelse. This, in turn, would allow the run time system to garbage collect something (we assume no further references). GHCi still references the thunk, which in turn references somethingelse, so this would not be garbage collected.

Clearly the question was very implementation specific, and so is this answer :-)

Joachim Breitner
  • 25,395
  • 6
  • 78
  • 139
  • 2
    a binding can be shadowed but still referenced indirectly: `a <- return 1; f <- return (const a); a <- return (); print $ f undefined`. – Will Ness Nov 04 '14 at 13:16
  • Thank you for your time digging more into that topic! I reported similar bug already (https://ghc.haskell.org/trac/ghc/ticket/9765) but I will report another one and separate the problem. – remdezx Nov 04 '14 at 13:49
  • 2
    @Will Ness: The binding cannot be referenced any more. Of course the ''value'' can, but it is safely referenced by the closure, there is no need for the “environment of named things” to keep hold onto it! – Joachim Breitner Nov 04 '14 at 13:55
  • but until it will be resolved I need some workaround or try to track these variables in ghc internals and clean them manually. I tried removing variables from `InteractiveContext` (they are stored in `ic_tythings`) but no effect yet. – remdezx Nov 04 '14 at 13:56
  • `ic_tythings` just contains information about names and types; the `HValue` is in the ClosureEnv – Joachim Breitner Nov 04 '14 at 14:00
  • @JoachimBreitner in Lisp, the *binding* is what's shared, not the value; in Haskell it's implementation-dependent of course, but -- two functions can reference same "outer" `a` which is later shadowed; I thought GHC prefers not to duplicate values in such situations (i.e. not to "unshare" the value of the binding - which can be a progressively instantiated list etc. so unsharing might cause re-calculations). Only if GC can prove there's no more references to it, can the shadowed binding be destroyed, I think. Just it being shadowed promises nothing. – Will Ness Nov 04 '14 at 15:21
  • (contd.) I was once said (here on SO) that this "unsharing" is almost certainly never done by GHC, IIRC. – Will Ness Nov 04 '14 at 15:26
  • @JoachimBreitner But Will's example demonstrates that the fact that the variable is shadowed does not mean you can garbage collect there, which is the point. (+1 for nice evidence of internals of shadowing) – AndrewC Nov 04 '14 at 16:45
  • 1
    @AndrewC: If course you cannot indiscriminately GC it. My point is that you can remove *one* *unnecessary* reference to the value, so that *if* it is not otherwise used. Which is the point of the original question. – Joachim Breitner Nov 04 '14 at 19:59
  • I tried to explain the difference between the binding in the intepreter’s environment and the value on the heap in the answer, I hope that clears the confusoin. – Joachim Breitner Nov 04 '14 at 20:16
  • 1
    @JoachimBreitner it seems that calling `Linker.deleteFromLinkEnv` and also cleaning entries from `ic_tythings` removes the bindings and memory is released. I'm still not sure if it is done correctly, but it helps me much to do some progress around this issue. – remdezx Nov 05 '14 at 09:57
  • @AndrewC, I know that when I want to release data that way I also need to track all other places where it was referenced, but this is what I'm doing already. The problem was how can I remove this only one additional reference. – remdezx Nov 05 '14 at 10:08