3

I have an program that listens on udp for log traffic, attempts to parse it, and then insert it into Redis. At a certain level of traffic, the memory seems to "explode" (Rapidly increase from a few hundred Megabytes to Gigabytes.

I've grabbed a heap profile shortly after this happens and it returns the following:

(pprof) top100 -cum
Total: 1731.3 MB
     0.0   0.0%   0.0%   1731.3 100.0% gosched0
  1162.5  67.1%  67.1%   1162.5  67.1% newdefer
     0.0   0.0%  67.1%   1162.5  67.1% runtime.deferproc
     0.0   0.0%  67.1%   1162.0  67.1% main.TryParse
     0.0   0.0%  67.1%    438.0  25.3% runtime.main
   301.5  17.4%  84.6%    437.5  25.3% main.main
   136.0   7.9%  92.4%    136.0   7.9% runtime.malg
     0.0   0.0%  92.4%    136.0   7.9% runtime.newproc
     0.0   0.0%  92.4%    136.0   7.9% runtime.newproc1
     1.5   0.1%  92.5%    131.3   7.6% main.RedisCuller
     0.0   0.0%  92.5%    108.5   6.3% github.com/garyburd/redigo/redis.(*conn).Do
     0.0   0.0%  92.5%    108.5   6.3% github.com/garyburd/redigo/redis.(*conn).readReply
     0.0   0.0%  92.5%    108.5   6.3% github.com/garyburd/redigo/redis.(*pooledConnection).Do
    95.8   5.5%  98.0%     95.8   5.5% cnew
     0.0   0.0%  98.0%     95.8   5.5% runtime.cnewarray
    34.0   2.0% 100.0%     34.0   2.0% runtime.convT2E
     0.0   0.0% 100.0%      0.5   0.0% main.init
     0.0   0.0% 100.0%      0.5   0.0% net/http/pprof.init
     0.0   0.0% 100.0%      0.5   0.0% sync.(*Once).Do
     0.0   0.0% 100.0%      0.5   0.0% syscall.Getenv
     0.0   0.0% 100.0%      0.5   0.0% time.init

When the program is "healthy" the profile looks more like:

(pprof) top20 -cum
Total: 186.7 MB
     0.0   0.0%   0.0%    186.7 100.0% gosched0
     0.5   0.3%   0.3%    122.7  65.7% main.RedisCuller
     0.0   0.0%   0.3%    103.5  55.4% github.com/garyburd/redigo/redis.(*pooledConnection).Do
     0.0   0.0%   0.3%    103.0  55.2% github.com/garyburd/redigo/redis.(*conn).Do
     0.0   0.0%   0.3%    103.0  55.2% github.com/garyburd/redigo/redis.(*conn).readReply
    88.2  47.2%  47.5%     88.2  47.2% cnew
     0.0   0.0%  47.5%     88.2  47.2% runtime.cnewarray
     0.0   0.0%  47.5%     57.0  30.5% main.TryParse
    57.0  30.5%  78.0%     57.0  30.5% newdefer
     0.0   0.0%  78.0%     57.0  30.5% runtime.deferproc
    34.0  18.2%  96.3%     34.0  18.2% runtime.convT2E
     1.5   0.8%  97.1%      6.5   3.5% main.main
     0.0   0.0%  97.1%      6.5   3.5% runtime.main
     5.0   2.7%  99.7%      5.0   2.7% runtime.malg
     0.0   0.0%  99.7%      5.0   2.7% runtime.newproc
     0.0   0.0%  99.7%      5.0   2.7% runtime.newproc1
     0.0   0.0%  99.7%      0.5   0.3% bufio.NewWriter
     0.0   0.0%  99.7%      0.5   0.3% bufio.NewWriterSize
     0.0   0.0%  99.7%      0.5   0.3% github.com/garyburd/redigo/redis.(*Pool).get
     0.0   0.0%  99.7%      0.5   0.3% github.com/garyburd/redigo/redis.(*pooledConnection).get

The only defer I have in my code is surrounding the parsing function (since it can frequently fail):

    for {
            rlen, _, err := sock.ReadFromUDP(buf[0:])
            checkError(err) 
            raw := logrow.RawRecord(string(buf[:rlen]))
            go TryParse(raw, c)
    }

    ...

    func TryParse(raw logrow.RawRecord, c chan logrow.Record) {
        defer func() {
                if r := recover(); r != nil {
                        //log.Printf("Failed Parse due to panic: %v", raw)
                        return
                }
        }()
        rec, ok := logrow.ParseRawRecord(raw)
        if !ok {
                return
                //log.Printf("Failed Parse: %v", raw)
        } else {
                c <- rec
        }
}

Does anyone see anything obvious I am doing wrong that could cause memory to suddenly balloon? Or maybe provide some direction in pinning it down?

Edit (More code around the logrow.Record Channel):

c := make(chan logrow.Record)
...
go RedisInserter(c, bucket, retention, pool)

func RedisInserter(c chan logrow.Record, bucket, retention int, p *redis.Pool) {
        for rec := range c {
                logrow.SendToRedis(rec, bucket, retention, p)
        }
}
Charles
  • 50,943
  • 13
  • 104
  • 142
Kyle Brandt
  • 26,938
  • 37
  • 124
  • 165
  • Are you closing `c` (`chan` of `logrow.Record`)? If so, are you deferring the closing of `c`? Not sure what else to suggest without seeing more code sorry. – Intermernet Sep 24 '13 at 12:40
  • @Intermernet: Added a little more code around that channel. I never close it because what "should" happen is that the channel is constantly being filled with logrows by the `go TryParse` and that `RedisInserter` loop should be consuming those rows. Looking at it now I could see that maybe logrow.SendToRedis could get blocked somehwere, and the channel fills up - but don't see how that would result in high newdefer memory usage? – Kyle Brandt Sep 24 '13 at 12:49
  • There is a memory leak in a closure'd defer function. Which would be the case if you uncommented the printf.: https://codereview.appspot.com/10784043/ Have you tried with tip? – ClojureMostly Sep 24 '13 at 13:13
  • @Aralo: The printf is commented out in the current version, so don't really being that the source. Also the memory increase happens pretty fast. Doing some higher resolution monitoring to try to find out how fast currently... – Kyle Brandt Sep 24 '13 at 13:16
  • What do you send as parameter to `panic`? Isn’t that big? – Mostafa Sep 24 '13 at 13:22
  • @Mostafa: I never call panic – Kyle Brandt Sep 24 '13 at 13:24
  • @Mostafa: What is happening is that tryparse is doing a bunch of string slicing, so the panic will happen inside of that due to the slice being out of bounds on the rawrecord. So maybe those raw records never get cleaned up or something? – Kyle Brandt Sep 24 '13 at 13:27
  • @KyleBrandt I realized the `recover` is for index out of bound and other runtime panics when you said you never call panic, but can’t be sure about what’s causing the memory usage. – Mostafa Sep 24 '13 at 13:44
  • @Mostafa: Is actually dying right now, which I knew how to inspect memory with gdb :-/ – Kyle Brandt Sep 24 '13 at 13:44
  • FYI: `newdefer()` seems to be only called by `panic` (http://golang.org/src/pkg/runtime/panic.c#L23). Not sure if that helps. – Intermernet Sep 24 '13 at 14:24
  • @KyleBrandt what version of go are you using this with? It's probably worth it to at least try the 1.2RC that just came out (or even tip): https://code.google.com/p/go/downloads/list – ClojureMostly Sep 24 '13 at 15:18
  • @Aralo: go version go version go1.1.1 darwin/amd64 and cross compiling for Linux, will try that out – Kyle Brandt Sep 24 '13 at 15:22
  • @Aralo: Looks like you were right with the leak in the closure'd defer function (no crashes since rebuilding with go latest. If you answer I can give you your points :-) – Kyle Brandt Sep 27 '13 at 15:27
  • @KyleBrandt I'm too lazy. Answer it yourself ;) Glad it's resolved for you! Cheers – ClojureMostly Sep 27 '13 at 15:32

1 Answers1

1

Turned out to be a leak in closured deferred function (a problem in Go itself) as we see in:

    defer func() {
            if r := recover(); r != nil {
                    //log.Printf("Failed Parse due to panic: %v", raw)
                    return
            }
    }()

So rebuilding with the latest version of Go resolved the problem ( Reference: https://codereview.appspot.com/10784043/ ). However, the smart thing to do is to of course design the ParseRecordFunction so it doesn't end up trying to slice out of bounds and cause a panic.

Kyle Brandt
  • 26,938
  • 37
  • 124
  • 165