3

all.

While trying to solve some programming quiz: https://www.hackerrank.com/challenges/missing-numbers , I came across with space leak.

Main function is difference, which implements multi-set difference. I've found out that List ':' and Triples (,,) kept on heaps with -hT option profiling. However, only big lists are difference's two arguments, and it shrinks as difference keeps on tail recursion. But the memory consumed by lists keeps increasing as program runs.

Triples is ephemeral array structure, used for bookkeeping the count of multiset's each element. But the memory consumed by triples also keeps increasing, and I cannot find out why.

Though I've browsed similar 'space leak' questions in stackoverflow, I couldn't grasp the idea. Surely I have much to study.

I appreciate any comments. Thank you.

p.s) executable is compiled with -O2 switch.

$ ./difference -hT < input04.txt
Stack space overflow: current size 8388608 bytes.
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3

.

import Data.List
import Data.Array

-- array (non-zero-count, start-offset, array_data)
array_size=101

myindex :: Int -> Int -> Int
myindex key offset
    | key >= offset = key - offset
    | otherwise     = key - offset + array_size

mylookup x (_,offset,arr) = arr ! idx
    where idx = myindex x offset

addOrReplace :: Int -> Int -> (Int, Int, Array Int (Int,Int)) -> (Int, Int, Array Int (Int,Int))
addOrReplace key value (count,offset,arr) = (count', offset, arr // [(idx,(key,value))])
    where idx = myindex key offset
          (_,prev_value) = arr ! idx
          count' = case (prev_value, value) of
                      (0,0) -> count
                      (0,_) -> count + 1
                      (_,0) -> count - 1
                      otherwise -> count

difference :: (Int,Int,Array Int (Int,Int)) -> [Int] -> [Int] -> [Int]
difference (count,offset,arr) [] []
    | count == 0 = []
    | otherwise  = [ k | x <- [0..array_size-1], let (k,v) = (arr ! x), v /= 0]
difference m (x:xs) y = difference new_m xs y
    where (_,v) = mylookup x m
          new_m = addOrReplace x (v + 1) m
difference m [] (y:ys) = difference new_m [] ys
    where (_,v) = mylookup y m
          new_m = if v == 0
            then m
            else addOrReplace y (v - 1) m

main = do
    n <- readLn :: IO Int
    pp <- getLine
    m <- readLn :: IO Int
    qq <- getLine
    let p = map (read :: String->Int) . words $ pp
        q = map (read :: String->Int) . words $ qq
        startArray = (0,head q, array (0,100) [(i,(0,0)) | i <- [0..100]] )
    putStrLn . unwords . map show . sort $ difference startArray q p

[EDIT] I seq'ed value and Array thanks to Carl's advice. I attach heap diagram.

[original heap profiling] [original heap]1

[after seq'ing value v]

difference m (x:xs) y = difference new_m xs y
    where (_,v) = mylookup x m
          new_m = v `seq` addOrReplace x (v + 1) m

heap after seq'ing v

[after seq'ing value v and Array]

difference m (x:xs) y = new_m `seq` difference new_m xs y
    where (_,v) = mylookup x m
          new_m = v `seq` addOrReplace x (v + 1) m

heap after seq'ing v and Array

Chul-Woong Yang
  • 1,223
  • 10
  • 17

1 Answers1

1

I see three main problems with this code.

First (and not the cause of the memory use, but definitely the cause of generally poor performance) Array is horrible for this use case. O(1) lookups are useless when updates are O(n).

Speaking of, the values being stored in the Array aren't forced while difference is looping over its first input. They are thunks containing pointers to an unevaluated lookup in the previous version of the array. You can ensure that the value is evaluated at the same time the array is updated, in a variety of ways. When difference loops over its second input, it does this accidentally, in fact, by comparing the value against 0.

Third, difference doesn't even force the evaluation of the new arrays being created while traversing its first argument. Nothing requires the old array to be evaluated during that portion of the loop.

Both of those latter issues need to be resolved to fix the space leak. The first issue doesn't cause a space leak, just much higher overheads than needed.

Carl
  • 26,500
  • 4
  • 65
  • 86
  • Would you tell me why Array update `arr // [(idx,(key,value))]` are O(n)? I've slept on your comments but couldn't find why. – Chul-Woong Yang Jan 22 '16 at 02:02
  • Also, to your guidance, I've `seq`ed value and array. But heap profiling shows lists (:) and mysterious MUT_ARR_PTRS_FROZEN keep increasing. – Chul-Woong Yang Jan 22 '16 at 02:04
  • I managed to find out that list leak is related to mis-handling input arguments. But still I don't know how to properly handle arguments (I posted separate question on this). Things on MUT_ARR. and O(n) updates are still mystery. – Chul-Woong Yang Jan 22 '16 at 04:59