22

I have a server process implemented in haskell that acts as a simple in-memory db. Client processes can connect then add and retrieve data. The service uses more memory than I would expect, and I'm attempting to work out why.

The crudest metric I have is linux "top". When I start the process I see an "VIRT" image size of ~27MB. After running a client to insert 60,000 data items, I see an image size of ~124MB.

Running the process to capture GC statistics (+RTS -S), I see initially

Alloc    Copied     Live    GC    GC     TOT     TOT  Page Flts
bytes     bytes     bytes  user  elap    user    elap
28296      8388      9172  0.00  0.00    0.00    0.32    0    0  (Gen:  1)

and on adding the 60k items I see the live bytes grow smoothly to

   ...
   532940     14964  63672180  0.00  0.00   23.50   31.95    0    0  (Gen:  0)
   532316      7704  63668672  0.00  0.00   23.50   31.95    0    0  (Gen:  0)
   530512      9648  63677028  0.00  0.00   23.50   31.95    0    0  (Gen:  0)
   531936     10796  63686488  0.00  0.00   23.51   31.96    0    0  (Gen:  0)
   423260  10047016  63680532  0.03  0.03   23.53   31.99    0    0  (Gen:  1)
   531864      6996  63693396  0.00  0.00   23.55   32.01    0    0  (Gen:  0)
   531852      9160  63703536  0.00  0.00   23.55   32.01    0    0  (Gen:  0)
   531888      9572  63711876  0.00  0.00   23.55   32.01    0    0  (Gen:  0)
   531928      9716  63720128  0.00  0.00   23.55   32.01    0    0  (Gen:  0)
   531856      9640  63728052  0.00  0.00   23.55   32.02    0    0  (Gen:  0)
   529632      9280  63735824  0.00  0.00   23.56   32.02    0    0  (Gen:  0)
   527948      8304  63742524  0.00  0.00   23.56   32.02    0    0  (Gen:  0)
   528248      7152  63749180  0.00  0.00   23.56   32.02    0    0  (Gen:  0)
   528240      6384  63756176  0.00  0.00   23.56   32.02    0    0  (Gen:  0)
   341100  10050336  63731152  0.03  0.03   23.58   32.35    0    0  (Gen:  1)
     5080  10049728  63705868  0.03  0.03   23.61   32.70    0    0  (Gen:  1)

This appears to be telling me that the heap has ~63MB of live data. This could well be consistent with numbers from top, by the time you add on stack space, code space, GC overhead etc.

So I attempted to use the heap profiler to work out what's making up this 63MB. The results are confusing. Running with "+RTS -h", and looking at the generated hp file, the last and largest snapshot has:

containers-0.3.0.0:Data.Map.Bin 1820400
bytestring-0.9.1.7:Data.ByteString.Internal.PS  1336160
main:KV.Store.Memory.KeyTree    831972
main:KV.Types.KF_1  750328
base:GHC.ForeignPtr.PlainPtr    534464
base:Data.Maybe.Just    494832
THUNK   587140

All of the other numbers in the snapshot are much smaller than this. Adding these up gives the peak memory usage as ~6MB, as reflected in the chart output:

enter image description here

Why is this inconsistent with the live bytes as shown in the GC statistics? It's hard to see how my data structures could be requiring 63MB, and the profiler says they are not. Where is the memory going?

Thanks for any tips or pointers on this.

Tim

timbod
  • 791
  • 5
  • 16

3 Answers3

3

I have a theory. My theory is that your program is using a lot of something like ByteStrings. My theory is that because the main content of ByteStrings is mallocated, they are not displayed while profiling. Thus you could run out of heap without the largest content of your heap showing up on the profiling graph.

To make matters even worse, when you grab substrings of ByteStrings, they by default retain the pointer to the originally allocated block of memory. So even if you are trying to only store a small fragement of some ByteString you could end up retaining the whole of the originally allocated ByteString and this won't show up on your heap profile.

That is my theory anyways. I don't know enough facts about how GHC's heap profiler works nor about how ByteStrings are implemented to know for certain. Maybe someone else can chime in and confirm or dispute my theory.

Edit2: tibbe notes that the buffer used by ByteStrings are pinned. So if you are allocating/freeing lots of small Bytestrings, you can fragment your heap meaning you run out of useable heap with half of it unallocated.

Edit: JaffaCake tells me that sometimes the heap profiler will not display the memory allocated by ByteStrings.

Russell O'Connor
  • 2,222
  • 1
  • 16
  • 16
1

You should use, e.g., hp2ps to get a graphical view of what's going on. Looking at the raw hp file is difficult.

augustss
  • 22,884
  • 5
  • 56
  • 93
  • 2
    Thanks - I did use hp2ps. As a new stack overflow users, I was not allowed to paste the resulting image :-(. But the lines pasted above from the hp file reflect the graphical view shown by hp2s. – timbod Mar 15 '11 at 03:32
  • @timbod: You should now have enough rep to include the image; that'd probably be helpful to do. – Antal Spector-Zabusky Mar 15 '11 at 04:25
1

Not everything is included in the profile by default, for example threads and stacks. Try with +RTS -xT.

Simon Marlow
  • 12,785
  • 4
  • 42
  • 32
  • I tried with `+RTS -h -S -xt`. This still gives me substantially different results in the live bytes count (63MB) versus the peak data as shown in the profile (approx 6MB). Hence I'm still at a loss as to how the memory is being used. – timbod Mar 22 '11 at 01:06