I use Ganglia to monitor Hadoop Flume Agents' performance. For almost 1 year now, it had been working very well. Last week gmetad started crashing with buffer overflow. Only thing that has changed in last few days is we started monitoring more instances of flume agent.
> gmond -V
gmond 3.7.2
> gmetad -V
gmetad 3.7.2
Below is the output I get when I run gmetad at command prompt with debug=100. Please suggest how to overcome the buffer overflow problem.
Writing Summary data for source atl-ganglia, metric flume.SINK.hdfsSink.StartTime
Writing Summary data for source atl-ganglia, metric flume.CHANNEL.fileChannel.ChannelFillPercentage
*** buffer overflow detected ***: /usr/sbin/gmetad terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f9899751597]
/lib64/libc.so.6(+0x100480)[0x7f989974f480]
/lib64/libc.so.6(+0xff8d9)[0x7f989974e8d9]
/lib64/libc.so.6(_IO_default_xsputn+0xc9)[0x7f98996c3639]
/lib64/libc.so.6(_IO_vfprintf+0x41c0)[0x7f9899697190]
/lib64/libc.so.6(__vsprintf_chk+0x9d)[0x7f989974e97d]
/lib64/libc.so.6(__sprintf_chk+0x7f)[0x7f989974e8bf]
/usr/sbin/gmetad[0x40a714]
/usr/sbin/gmetad[0x40841f]
/usr/lib64/libganglia.so.0(hash_foreach+0x59)[0x7f989af731f9]
/usr/sbin/gmetad[0x408151]
/lib64/libexpat.so.1(+0xa836)[0x7f9899e16836]
/lib64/libexpat.so.1(+0xbbce)[0x7f9899e17bce]
/lib64/libexpat.so.1(+0xd4fa)[0x7f9899e194fa]
/lib64/libexpat.so.1(+0xde3b)[0x7f9899e19e3b]
/lib64/libexpat.so.1(XML_ParseBuffer+0x6d)[0x7f9899e1288d]
/usr/sbin/gmetad[0x409ddc]
/usr/sbin/gmetad[0x405116]
/lib64/libpthread.so.0(+0x7a51)[0x7f98999eaa51]
/lib64/libc.so.6(clone+0x6d)[0x7f989973796d]
======= Memory map: ========
00400000-00417000 r-xp 00000000 08:02 1077406 /usr/sbin/gmetad
00617000-00619000 rw-p 00017000 08:02 1077406 /usr/sbin/gmetad
00619000-0061a000 rw-p 00000000 00:00 0
0157e000-0159f000 rw-p 00000000 00:00 0 [heap]
7f988c000000-7f988c116000 rw-p 00000000 00:00 0
7f988c116000-7f9890000000 ---p 00000000 00:00 0
7f9890f0f000-7f9890f25000 r-xp 00000000 08:02 1569794 /lib64/libgcc_s-4.4.7-20120601.so.1
7f9890f25000-7f9891124000 ---p 00016000 08:02 1569794 /lib64/libgcc_s-4.4.7-20120601.so.1
.
.
.
7f989817f000-7f989837e000 ---p 00003000 08:02 1569886 /lib64/libgmodule-2.0.so.0.2800.8
7f989837e000-7f989837f000 rw-p 00002000 08:02 1569886 /lib64/libgmodule-2.0.so.0.2800.8/bin/bash: line 1: 19584 Aborted /usr/sbin/gmetad
[FAILED]