0

I'm trying to write my first Nagios plugin to check the statuses on WLAN Controllers APs. The goal was to make a kind of "universal" plugin but I'm getting an error:

.1.3.6.1.4.1.14179.2.2.1.1.3.0.: Unknown Object Identifier ()

/usr/lib/nagios/plugins/check_wlc_ap_state.sh: line 50: [: Such Instance currently exists at this OID: integer expression expected
/usr/lib/nagios/plugins/check_wlc_ap_state.sh: line 53: [: Such Instance currently exists at this OID: integer expression expected
/usr/lib/nagios/plugins/check_wlc_ap_state.sh: line 56: [: Such Instance currently exists at this OID: integer expression expected
UNKOWN-  = Such Instance currently exists at this OID

Here's my code:

#!/bin/bash

while getopts "H:C:O:N:I:w:c:h" option; do
        case "$option" in
                H )     host_address=$OPTARG;;
                C )     host_community=$OPTARG;;
                O )     ap_op_status_oid=$OPTARG;;
                N )     ap_hostname_oid=$OPTARG;;
                w )     warn_thresh=$OPTARG;;
                c )     crit_thresh=$OPTARG;;
                h )     show_help="yes";;

        esac
done

# Help Menu
help_menu="Plugin to check AP operational status.
Example:
Check AP Status on Cisco CS5508
./check_wlc_ap_state.sh -H [Controller IP] -C [Controller Community] -O .1.3.6.1.4.1.14179.2.2.1.1.6.0 -N .1.3.6.1.4.1.14179.2.2.1.1.3.0 -w 2 -c 3

Required Arguments:
-H      WLAN Controller Address
-C      WLAN Controller RO Community String
-O      OID to AP Operation Status
-N      OID to AP Hostname
-c      Critical Threshold
-w      Warning Threshold

Optional Arguments:
-h      Display help 
"

if [ "$show_help" = "yes" -o $# -lt 1 ]; then
  echo "$help_menu"
  exit 0
fi

# Change the .1. to iso. and get the length + 1 to get rid of the trailing .
ap_op_status_oid=${ap_op_status_oid:2}
ap_op_status_oid="iso$ap_op_status_oid"
ap_op_status_oid_length=${#ap_op_status_oid}
ap_op_status_oid_length="$ap_op_status_oid_length+1"

#Get info
while read -r oid_index equal integer ap_stat;
do
        ap_index="${oid_index:$ap_op_status_oid_length}"
        ap_hostname=$(snmpget -c $host_community $host_address -v 1 $ap_hostname_oid.$ap_index | awk -F '"' '{print$2}')
        if [ "$ap_stat" -lt "$warn_thresh" ]; then
                echo -n "OK- $ap_hostname = $ap_stat | "
                exit 0;
        elif [ "$ap_stat" -eq "$warn_thresh" ]; then
                echo -n "WARNING- $ap_hostname = $ap_stat | "
                exit 1;
        elif [ "$ap_stat" -ge "$crit_thresh" ]; then
                echo -n "CRITICAL- $ap_hostname = $ap_stat | "
                exit 2;
        else
                echo -n "UNKOWN- $ap_hostname = $ap_stat | "
                exit 3;
        fi

done < <(snmpwalk -c $host_community -v 2c $host_address $ap_op_status_oid)

And here's the input and desired output. I'm not sure about if the output is right for Nagios/Icinga2 though.

./check_wlc_ap_state.sh -H 10.77.208.12 -C r350urc31 -O .1.3.6.1.4.1.14179.2.2.1.1.6.0 -N .1.3.6.1.4.1.14179.2.2.1.1.3.0 -w 2 -c 3

OK- AP-1 = 1 | OK- AP-2 = 1 | OK- AP-3 = 1 | OK- AP-4 = 1 | OK- AP-5 = 1 | OK- AP-6 = 1 | OK- AP-7 = 1 | OK- AP-8 = 1 |

Edit: Here's the set -x

:40+ap_op_status_oid=.3.6.1.4.1.14179.2.2.1.1.6.0
:41+ap_op_status_oid=iso.3.6.1.4.1.14179.2.2.1.1.6.0
:42+ap_op_status_oid_length=31
:43+ap_op_status_oid_length=32
:46+read -r oid_index equal integer ap_stat
::69+snmpwalk -c public -v 2c 10.77.208.12 iso.3.6.1.4.1.14179.2.2.1.1.6.0
:48+oid_index=iso.3.6.1.4.1.14179.2.2.1.1.6.0.129.196.3.1.112
:49+equal==
:50+integer=INTEGER:
:51+ap_stat=1
:52+ap_index=129.196.3.1.112
::53+awk -F '"' '{print$2}'
::53+snmpget -c public 10.77.208.12 -v 1 .1.3.6.1.4.1.14179.2.2.1.1.3.0.129.196.3.1.112
:53+ap_hostname=AP-01
:55+'[' 1 -lt 2 ']'
:56+echo -n 'OK- AP-01 = 1 | '
OK- AP-01 = 1 | :57+exit 0
cflinspach
  • 290
  • 1
  • 4
  • 16
  • 1
    You have a big string you're passing to a numeric test. That doesn't work. – Charles Duffy Feb 13 '17 at 16:33
  • 1
    BTW, you've got a bunch of quoting bugs. Run your code through http://shellcheck.net/ and fix what it finds. – Charles Duffy Feb 13 '17 at 16:33
  • 1
    BTW, do you expect `ap_op_status_oid_length="$ap_op_status_oid_length+1"` to be a math operation? It isn't. Perhaps you want `ap_op_status_oid_length=$((ap_op_status_oid_length+1))`, or `(( ++ ap_op_status_oid_length ))` -- the latter being a bashism, the former being POSIX-compliant. – Charles Duffy Feb 13 '17 at 16:35
  • 1
    Also -- when writing up your code in a StackOverflow question, try to generate a [Minimal, Complete, Verifiable Example](http://stackoverflow.com/help/mcve) -- "minimal" meaning it doesn't contain extraneous components (like help), "complete" and "verifiable" meaning other people can run it to see your problem and/or determine whether their fixes work -- that means, for example, perhaps hardcoding `snmpget` or `snmpwalk` as functions with hardcoded output, or removing them outright if you can do so and still show the problem. – Charles Duffy Feb 13 '17 at 16:38
  • 1
    Also, `echo -n` is actually depending on implementation-defined behavior. [The POSIX spec](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html) advises using `printf` instead in new code: `printf '%s' "your string here"` is a more reliable and portable alternative to `echo -n "your string here"`. (See the APPLICATION USAGE and RATIONALE sections of the above link to understand why `echo`'s portability is limited except within a very limited domain of arguments). – Charles Duffy Feb 13 '17 at 16:40
  • Thanks for the advice! I didn't know about shellcheck.net; I like it. – cflinspach Feb 13 '17 at 16:50
  • As another aside, the `-o` argument to `test` is obsolescent; `[ "$show_help" = yes ] || [ $# -lt 1 ]` is the preferred mechanism to combine multiple tests; see the '[OB]' notation in [the standard](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html). Supporting multiple in one tests requires grouping, but grouping gets ambiguous without the ancient `x$foo` hack; consider `[ ( = ) ]`: is it asking if `=` is non-empty, or asking if `(` and `)` are identical strings? GNU test assumes the latter, but the former is supportable. – Charles Duffy Feb 13 '17 at 17:08
  • Unless you are doing this with educational purposes, nagios plugins in bash are generally a bad idea (bad performance and process spawning). – Bruno9779 Feb 13 '17 at 18:43
  • @Bruno9779, I agree to a point -- that said, the performance of native shell scripts has much to do with how they're written. Stick to builtins, avoid command substitution and subshells, and those caveats can be minimized. `[` is a builtin in modern shells -- we're not actually running `/usr/bin/[` for every test -- so it's the `snmpwalk`, `snmpget` and `awk` invocations that are expensive here; it's the latter two, invoked inside a loop, that are actually most expensive -- but if we passed multiple OIDs to just one snmpget invocation, that cost could be greatly decreased. – Charles Duffy Feb 13 '17 at 20:54
  • (...using `read` to do the work currently being performed by `awk` would likewise be a helpful optimization -- and if on a platform with real David Korn ksh93, using that heavily-optimized interpreter rather than bash doesn't hurt). – Charles Duffy Feb 13 '17 at 20:55

1 Answers1

1

ap_stat contains non-numeric contents. Your code is passing it to a test operator that requires it to parse as numeric.

Use set -x to trace your script's execution; this will make this kind of situation easier to diagnose.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • ap_stat just returns a 1 – cflinspach Feb 13 '17 at 16:50
  • 1
    Clearly not in the situation where you get this error. Again, collect `set -x` logs (as by running `PS4=':$LINENO+' bash -x yourscript`). – Charles Duffy Feb 13 '17 at 17:01
  • 1
    @red_eagle, ...I'm presuming that `snmpwalk` is returning an error. You're reading the first word of that error into `oid_index`, the second word of it into `equal`, the third word into `integer`, and the rest of it ends up in `ap_stat`. – Charles Duffy Feb 13 '17 at 17:04
  • 1
    @red_eagle, ...to log the contents of variables populated by `read` when using `set -x`, I often add a line just inside my loop that looks like: `: oid_index="$oid_index" equal="$equal" integer="$integer" ap_stat="$ap_stat"` -- since `:` is a synonym for `true`, it does nothing when not in debugging mode, but when you *are* running with `set -x`, it shows you your relevant values. – Charles Duffy Feb 13 '17 at 17:05
  • I added the set -x output to the post – cflinspach Feb 13 '17 at 17:29
  • 1
    Notably, that `set -x` output doesn't show an instance where the error is actually being emitted. Rather essential, that. – Charles Duffy Feb 13 '17 at 17:31
  • ...note, if you only get your error when running in production, that `set -x` output is on stderr by default -- so with bash 3.x, you can `exec 2>>/tmp/some.log` to redirect it -- whereas with bash 4.x, you can control the file descriptor it's on by setting `BASH_XTRACEFD`, so you can run something like `exec 3>>/tmp/some.log; BASH_XTRACEFD=3; set -x` and the xtrace logs -- and *only* the xtrace logs -- will go to `/tmp/some.log`. If you're going to do it that way, make sure you've added the debugging line I suggested above, since you won't have stderr to work with. – Charles Duffy Feb 13 '17 at 17:34
  • Yeah, the plugin works when I run it stand alone but when Icinga2 runs it, error. – cflinspach Feb 13 '17 at 17:34
  • So, as I said, instrument the copy that's run in production so you can get `set -x` logs *when the error is actually happening*. – Charles Duffy Feb 13 '17 at 17:34
  • ...I'll hazard that `integer=No` in the case where you're hitting your failure, so it's actually "No such instance" in the full message. – Charles Duffy Feb 13 '17 at 17:39