5

I've been trying to ensure the strictness of an in-memory model of a Haskell program using ghc-heap-view package and the utils it provides when I noticed that my HashMaps don't seem to be in NF upon a series on inserts. I tried printing Heap tree and indeed it shows some thunks. I then tried another way of inserting elements (using union and singleton) and this time it comes out strict.

Could somebody please explain why is this so and advise if there's anything I can do to make insert behave the same way as the other method?

Here's my test code:

module Main where

import           Control.Exception           (evaluate)
import           Data.Foldable
import           Data.HashMap.Strict         (HashMap)
import qualified Data.HashMap.Strict         as HM
import           GHC.HeapView

test1 :: HashMap Int Int
test1 = foldl' (\m v -> HM.insert v v m) HM.empty [0..5]

test2 :: HashMap Int Int
test2 = foldl' (\m v -> HM.union (HM.singleton v v) m) HM.empty [0..5]

main :: IO ()
main = do
  putStrLn "HeapTree for test1"
  t1 <- evaluate test1
  buildHeapTree 10 (asBox t1) >>= print . ppHeapTree

  putStrLn "HeapTree for test2"
  t2 <- evaluate test2
  buildHeapTree 10 (asBox t2) >>= print . ppHeapTree

And here's the output:

HeapTree for test1
"BitmapIndexed ([ (_thunk (I# 0) (I# 0) 0), (_thunk (I# 1) (I# 1) 1), (Leaf (I# 2) (I# 2) 2), (Leaf (I# 3) (I# 3) 3), (Leaf (I# 4) (I# 4) 4), (Leaf (I# 5) (I# 5) 5) ]) 63"
HeapTree for test2
"BitmapIndexed ([ (Leaf (I# 0) (I# 0) 0), (Leaf (I# 1) (I# 1) 1), (Leaf (I# 2) (I# 2) 2), (Leaf (I# 3) (I# 3) 3), (Leaf (I# 4) (I# 4) 4), (Leaf (I# 5) (I# 5) 5) ]) 63"
(0.02 secs, 1,067,672 bytes)
duplode
  • 33,731
  • 7
  • 79
  • 150
MantasG
  • 117
  • 1
  • 5
  • 3
    Those thunks seem to be for internal nodes of the map. Looking at the documentation for the strictness properties (version 0.2.9.0 http://hackage.haskell.org/package/unordered-containers-0.2.9.0/docs/Data-HashMap-Strict.html#g:1) it only says that *keys* and *values* are reduced to WHNF, but says nothing about the internal nodes. – danidiaz May 05 '19 at 20:04
  • 2
    If there are really thunks anywhere in a `HashMap` constructed using only strict operations, I'd call that a bug; for performance, we don't want that. What version of `unordered-containers` are you using? Please open a ticket. – dfeuer May 06 '19 at 01:26
  • 1
    I've tested this with tip of `unorder-containers` master branch (0.2.10.0). I have created a simple project to demonstrate this behaviour: https://github.com/mantasg/hashmap-test. I also created a ticket: https://github.com/tibbe/unordered-containers/issues/232. – MantasG May 06 '19 at 16:37
  • 1
    I believe the problem is with the helper function `two` in `Data.HashMap.Base`, which isn't strict in its key arguments. But I'm still surprised, because I'd expect inlining to reveal that those arguments have been forced. – dfeuer May 06 '19 at 17:21
  • Ah.... I guess probably because `two` is defined as a (recursive) `go` function.... – dfeuer May 06 '19 at 17:37
  • You seem to be right. Adding bangs to the arguments of `go` inside `two` fixes it. – MantasG May 06 '19 at 17:41

1 Answers1

3

When inserting a new, non-colliding key into a Leaf node, insert uses a helper function called two to generate a two-element map. The two function is lazy in the values of the keys, which leads to GHC creating thunks to create the two new Leaf nodes. This whole thing is pretty silly, because the keys are actually certain to be in WHNF by then. But (presumably because of a recursive go function) GHC doesn't realize that. The problem should be fixed in the next version of unordered-containers.

dfeuer
  • 48,079
  • 5
  • 63
  • 167