3

I'm trying to add a node to an existing munin master (which I didn't setup but which seems to be working fine as it shows graphs for 8 existing nodes) and I'm having some troubles. Here are the steps I followed:

Master

Added the node to /etc/munin/munin.conf

[server.example.org]
   address private.server.example.org

The html directory of the master is (matches the apache configuration):

htmldir /opt/munin

That directory contains the following files and folders:

ls -lh /opt/munin/
drwxr-xr-x 20 munin munin 4.0K 2011-11-07 16:15 example.org <= FOLDER NAMED AFTER OUR DOMAIN
-rw-r--r--  1 munin munin 2.5K 2010-08-03 14:11 definitions.html
-rw-r--r--  1 munin munin 3.0K 2010-08-03 14:11 favicon.ico
-rw-r--r--  1 munin munin  15K 2011-11-07 16:21 index.html  <= MAIN MUNIN PAGE
-rw-r--r--  1 munin munin 1.8K 2010-08-03 14:11 logo-h.png
-rw-r--r--  1 munin munin  473 2010-08-03 14:11 logo.png
-rw-r--r--  1 munin munin 5.6K 2010-11-03 14:07 style.css

The footer of index.html indicates that this file is generated dynamically by munin so I know I don't have to touch this file.

This page was generated by <a href='http://munin-monitoring.org/'>Munin</a> version 1.4.4 at 2011-11-07 16:21:30+0000 (UTC)

The domain directory contains folders for all the nodes. I ended up creating one for the new node hoping it would help but it made no difference

mkdir /opt/munin/example.org/server.example.org
chown munin:munin -R /opt/munin/example.org/server.example.org

I killed munin-cron and restarted it but the makes no difference either.

$ sudo su munin munin-cron start
$ sudo ps aux | grep munin-cron
munin    26566  0.0  0.2   4092   584 ?        Ss   16:35   0:00 /bin/sh -c if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi
munin    26567  0.0  0.2   4092   576 ?        S    16:35   0:00 /bin/sh /usr/bin/munin-cron

Munin node

Installed munin-node package

apt-get install munin-node

Modified the /etc/munin/munin-node.conf file to allow accces from the munin master

host *
allow ^A\.B\.C\.D$  # master IP address
port 4949

Restarted munin node

service munin-node start

If I run a tcpdump on the new node I can see some data being exchanged with the master so I believe at this point the issue is with configuring the master.

Any idea as to what I'm issing or how I can troubleshoot this further?

Additional troubleshooting

As advised I checked the logs

$ grep server.example.org /var/log/munin/munin-update.log

2011/11/08 08:40:03 [WARNING] Config node server.example.org listed no services for server.example.org.  Please see http://munin-monitoring.org/wiki/FAQ_no_graphs for further information.
2011/11/08 09:10:02 [INFO] Reaping Munin::Master::UpdateWorker<example.org;server.example.org>.  Exit value/signal: 0/0

The warning brought me to this page http://munin-monitoring.org/wiki/FAQ_no_graphs. I followed steps by steps the advised given. Although the symlinks seemed to be properly created I did run the command munin-node-configure --shell | sh -x which believe fixed the issue. The aforementioned page also recommended to change set host_name which I did (although I don't believe it helped since the other working nodes don't have it configured).

The telnet troubleshooting was successful by the time I got to it

$ telnet private.server.example.org 4949
Trying A.B.C.D...
Connected to private.server.example.org.
Escape character is '^]'.
# munin node at server.example.org

> nodes
server.example.org
.

> list server.example.org
cpu df df_inode entropy forks fw_conntrack fw_forwarded_local fw_packets if_err_eth0 if_err_eth1 if_eth0 if_eth1 interrupts iostat iostat_ios ip_A.B.C.D irqstats load memory open_files open_inodes postfix_mailqueue postfix_mailvolume proc_pri processes swap threads uptime users vmstat

> fetch df
_dev_sda1.value 23.1295909196156
_dev.value 1.2890625
_dev_shm.value 0
_var_run.value 0.00782368542525642
_var_lock.value 0
_lib_init_rw.value 0
Max
  • 3,523
  • 16
  • 53
  • 71

3 Answers3

3

I can't see anything obviously wrong with your setup. I will suggest two things;

  • Read the logs on the munin-master. /var/log/munin/munin-update.log is the place to start. If you have entries confirming that an update is successful, and you got the rrd-files in /var/lib/munin/ - continue to munin-graph.log and munin-html.log

  • Verify that the master is able to connect to the address of the munin-node. Please test with netcat or similar: nc private.server.example.org 4949. Expected output should be: # munin node at hostname. Possible errors are packets being dropped by a firewall (whereas nc will hang at connect(), visible if you use strace), or failing to resolve the name (whereas netcat outputs nc: getaddrinfo: Name or service not known).

If you can't find anything after trying the above, please paste a complete munin.conf from the master, (anonymize numeric IP-addresses with numbers, and hostnames with some bogus text if you have to).

Not too uncommon error; The cron-job may have been invoked by root at some point, where some files have root-ownership and aren't possible to be updated by the munin-user, who usually needs write access to all files in /var/lib/munin and the html-directory.

Kvisle
  • 4,193
  • 24
  • 25
  • Will review the logs as suggested and report back. As for the network troubleshooting it's already done, tcpdump shows that everything's fine. – Max Nov 08 '11 at 07:25
  • Cool, added another suggestion too. – Kvisle Nov 08 '11 at 08:15
  • Always good to review logs. Issue fixed (see **Additional troubleshooting**). Thanks – Max Nov 08 '11 at 09:48
1

Hey I had the same problem.

Check your /etc/hosts file on the host and doublecheck that the first hostname is the same one that you specified in your munin conf file on the server.

That totaly wrecked our setup until we found out.

our /etc/host was set to: 1.2.3.4 hostname hostname.domain

Munin conf was set to hostname.domain. server thought it was named hostname and not hostname.domain..

Petter
  • 11
  • 1
0

Sometimes it helps to override the hostname supplied by the node, with use_node_name:

[server.example.org]
   address private.server.example.org
   use_node_name yes
redburn
  • 197
  • 1
  • 7