0

I am looking to build a script where I can monitor a file of IPs with ping command when Up or Down.

I have found two excellent methods on stackoverflow and i am trying to combine them but whatever I do it does not work. I am reading man shell to learn for future as well but I think I am missing something and cannot make it work.

Script 1:

I cannot seem to find the script i found on stackoverflow but found the same under this resource: (Bash and Ping) section in: https://jmanteau.fr/posts/the-facets-of-ping/#check-if-many-hosts-are-alive

This amazing script can ping very fast multiple hosts in parallel

#!/bin/bash

argc=$#
if [ $# -lt 1 ]
then
   echo "Usage: $0 <ip-list-file>"
   exit 1
fi

hosts=$1

function customping 
{
    DATE=$(date '+%d/%m/%Y %H:%M:%S')
    ping -c 1 -W 1 $1 >/dev/null 2>&1 && echo "$DATE Node $1 is UP" || echo -e "\033[1;31m $DATE Node $1 is DOWN \033[0m"
# ping -c 1 -W 1 $1 >/dev/null 2>&1 && echo "$DATE Node $1 is UP" || echo "$DATE Node $1 is DOWN"
# sleep 0.01s
}

T="$(date +%s%N)"

DEFAULT_NO_OF_PROC=8
noofproc=$DEFAULT_NO_OF_PROC

if [ -n "$2" ] #user-set no. of process instead of default
then
  noofproc=$2
  echo "Max processes: $noofproc"
fi

export -f customping && cat $hosts | xargs -n 1 -P $noofproc -I{} bash -c 'customping {}' \;

Script 2:

https://stackoverflow.com/a/4708831/19313640

This amazing script loops through the IPs and shows if down or up (monitoring)

function check_health {

set 192.168.10.1 192.168.10.2 192.168.10.3 192.168.10.4 192.168.10.5 192.168.10.6 192.168.10.7 192.168.10.8 192.168.10.9 192.168.10.10 192.168.10.11 192.168.10.12 192.168.10.13

trap exit 2

for ipnumber in "$@"; do
  DATE=$(date '+%d/%m/%Y %H:%M:%S')
  ping -c 1 -t 1 $ipnumber > /dev/null
  [ $? -eq 0 ] && echo -e "|\033[1;36m $DATE \033[0m" "|\033[1;32m Node |"$ipnumber "| UP \033[0m" | column -t -s "|"
done

while true; do
  i=1
  for ipnumber in "$@"; do
    statusname=up$i
    laststatus=${!statusname:-0}
    ping -c 1 -t 1 $ipnumber > /dev/null
    ok=$?
    eval $statusname=$ok
    if [ ${!statusname} -ne $laststatus ]; then
      # echo $DATE Status changed for $ipnumber
      DATE=$(date '+%d/%m/%Y %H:%M:%S')
      if [ $ok -eq 0 ]; then
        echo -e "|\033[1;36m $DATE \033[0m" "|\033[1;32m Node |"$ipnumber "| UP \033[0m" | column -t -s "|"
      else
        echo -e "|\033[1;36m $DATE \033[0m" "|\033[1;31m Node |"$ipnumber "| DOWN \033[0m" | column -t -s "|"
      fi
    fi
    i=$(($i + 1))
  done
 # sleep 1
done

}

So my question is how to put these 2 scripts together and make it completely in parallel as first script and by reading a file instead of "set" in second script, but also with the monitor capabilities of the second script.

Edit: If it is complicated to make this work at least how can i make the second script read a file as argument as the first script does?

I hope I was thorough and gave enough info of what i am trying to do.

Thank you.


Update:

Hello again, I have manage to made it work in a messy code.

#!/bin/bash

trap exit 2

argc=$#
if [ $# -lt 1 ]
then
   echo "Usage: $0 <ip-list-file>"
   exit 1
fi

hosts=$1


function check_live {

  trap exit 2

  DATE=$(date '+%d/%m/%Y %H:%M:%S')
    ping -c 1 -t 1 $1 > /dev/null
    [ $? -eq 0 ] && echo -e "|\033[1;36m  $DATE \033[0m" "|\033[1;32m Node |"$1 "| UP \033[0m" | column -t -s "|"
    # sleep 1
}

function check_health {

  trap exit 2

#  DATE=$(date '+%d/%m/%Y %H:%M:%S')
#    ping -c 1 -t 1 $1 > /dev/null
#    [ $? -eq 0 ] && echo -e "|\033[1;36m  $DATE \033[0m" "|\033[1;32m Node |"$1 "| UP \033[0m" | column -t -s "|"
#    sleep 3

  while true; do
  # while read line; do 
  # i="$i $line"
    i=1
    for ipnumber in "$@"; do
      statusname=up$i
      laststatus=${!statusname:-0}
      ping -c 1 -t 1 $ipnumber > /dev/null
      ok=$?
      eval $statusname=$ok
      if [ ${!statusname} -ne $laststatus ]; then
        # echo $DATE Status changed for $ipnumber
        DATE=$(date '+%d/%m/%Y %H:%M:%S')
        if [ $ok -eq 0 ]; then
          echo -e "|\033[1;36m  $DATE \033[0m" "|\033[1;32m Node |"$ipnumber "| UP \033[0m" | column -t -s "|"
        else
          echo -e "|\033[1;36m  $DATE \033[0m" "|\033[1;31m Node |"$ipnumber "| DOWN \033[0m" | column -t -s "|"
        fi
      fi
      i=$(($i + 1))
    done
   # sleep 1
  done
}

function duck_art {

textred=$(tput setaf 1)
textcyan=$(tput setaf 12)
textyellow=$(tput setaf 11)
textpurple=$(tput setaf 4)
textpink=$(tput setaf 5)
textwhite=$(tput setaf 7)
textgray=$(tput setaf 8)
textgreen=$(tput setaf 10)


echo -e ${textpink} ================================================================
cat <<EOM
${textyellow}
EOM
cat << "EOF"
                  __                         __
              ___( o)>       DuckLab       <(o )___
              \ <_. )        Monitor        ( ._> /
               `---'                         `---' 
EOF
echo -e ${textpink} ================================================================
echo -e ${textyellow} "                    Press <CTRL+C> to exit.                   "
echo -e ${textpink} ================================================================
echo -e "\033[1;36m  $internal_ip \033[0m" "     ${textpink}|     " "\033[1;36m $my_name \033[0m" "     ${textpink}|     " "\033[1;36m $external_ip \033[0m"
echo -e ${textpink} ================================================================
}


external_ip=$(curl -s ifconfig.me)

internal_ip=$(ifconfig | grep -Eo 'inet (addr:)?([0-9]*\.){3}[0-9]*' | grep -Eo '([0-9]*\.){3}[0-9]*' | grep -v '127.0.0.1')

my_name=$(hostname)



function multi_process_live {
T="$(date +%s%N)"

DEFAULT_NO_OF_PROC=8
noofproc=$DEFAULT_NO_OF_PROC

if [ -n "$2" ] #user-set no. of process instead of default
  then
    noofproc=$2
    echo "Max processes: $noofproc"
  fi

export -f check_live && cat $hosts | xargs -n 1 -P $noofproc -I{} bash -c 'check_live  {}' \; 2>/dev/null 
}



function multi_process_health {
T="$(date +%s%N)"

DEFAULT_NO_OF_PROC=8
noofproc=$DEFAULT_NO_OF_PROC

if [ -n "$2" ] #user-set no. of process instead of default
  then
    noofproc=$2
    echo "Max processes: $noofproc"
  fi

export -f check_health && cat $hosts | xargs -n 1 -P $noofproc -I{} bash -c 'check_health  {}' \; 2>/dev/null 
}



# ================================ End of fucntions ================================

# ================================ Start of Script =================================

clear
duck_art
multi_process_live
multi_process_health

# ================================ End of Script ===================================

First function which checks for live hosts works. Second function which loops shows initial output but doesnt loop correctly through each line of the file to monitor which hosts are UP or Down and print output. It does that only for the second line, which I assume it does not read each line of the file correctly.

Any ideas, improvements and suggestions to make this work and learn are highly valued.

Thank you.

Connor Low
  • 5,900
  • 3
  • 31
  • 52
Tom
  • 1
  • 1
  • What do you mean by _make it in parallel_? Do you want to make a client-server application, where one process (the client) is fetching the addresses, and the other one (the server) is doing the checking of the addresses? – user1934428 Jun 10 '22 at 12:02
  • Parallel I mean pinging multiple hosts at the same time. I am still trying to figure out exactly how the 1st script does that, but it works really fast with 254 Ips. I have also updated the thread with the relevant reference links. – Tom Jun 10 '22 at 13:57
  • "whatever I do it does not work" - start by showing us what you did try, and we can offer suggestions for that. – Paul Hodges Jun 10 '22 at 14:17
  • Hi Paul, I have edited and added the script. At the same time i am trying to eliminate the set variable and make it read from arguments, but also trying to wrap the health_function as customping was in order to produce the same result. Which is the best way to achieve reading the input of file as argument and parse to the health_check function? – Tom Jun 10 '22 at 14:47
  • I just realized that it might not be able to work since the health_check function is an infinite loop...but nevertheless why reading from file does not work with the changes I made? – Tom Jun 10 '22 at 15:18
  • Minor point `export -f customping && `-> The `&&` does not make sense, because I don't see how the `export` could fail. I suggest replacing it by `;`. BTW, where is it that "reading from file" would not work? – user1934428 Jun 12 '22 at 07:45
  • _it does not read each line of the file correctly_ .... What is the content of the file? I guess you refer here to the file stored in `$hosts`? – user1934428 Jun 12 '22 at 07:47

2 Answers2

0

A note that the magic of the original script was in using xargs -P $noofproc, to run the check in parallel.

Without knowing what your list of hosts looks like, it is difficult to guess what problem you're seeing with the first line. My best guess is that for your first entry the ping exit code was non-zero and it just didn't exit anything.

I see a few issues with your most recent check_health function.

  • The while true loop, as you noted
  • The for ipnumber in "$*" which makes no sense since you're calling it with one entry at a time anyway.
  • The problem that flows from that is that each call to check_health will start with i=1, so even if the while true wasn't there each of these calls would be setting the same statusname (up1) will end up getting used.

Here is what I would suggest (I hope you don't mind that I removed the non-essential bits from your script):

#!/bin/bash

trap exit 2

argc=$#
if [ $# -lt 1 ]; then
    echo "Usage: $0 <ip-list-file>"
    exit 1
fi

hosts=$1

function check_live {
    trap exit 2

    DATE=$(date '+%d/%m/%Y %H:%M:%S')
    ping -c 1 -t 1 $1 > /dev/null
    [ $? -eq 0 ] && echo -e "|\033[1;36m  $DATE \033[0m" "|\033[1;32m Node |"$1 "| UP \033[0m" | column -t -s "|"
}

function check_health {
    trap exit 2

    i=$1
    ipnumber=$2
    statusname=up$1
    laststatus=${!statusname:-0}
    ping -c 1 -t 1 $ipnumber > /dev/null
    ok=$?
    eval $statusname=$ok
    if [ ${!statusname} -ne $laststatus ]; then
        DATE=$(date '+%d/%m/%Y %H:%M:%S')
        if [ $ok -eq 0 ]; then
            echo -e "|\033[1;36m  $DATE \033[0m" "|\033[1;32m Node |"$ipnumber "| UP \033[0m" | column -t -s "|"
        else
            echo -e "|\033[1;36m  $DATE \033[0m" "|\033[1;31m Node |"$ipnumber "| DOWN \033[0m" | column -t -s "|"
        fi
    fi
}

function multi_process_live {
    T="$(date +%s%N)"

    DEFAULT_NO_OF_PROC=8
    noofproc=$DEFAULT_NO_OF_PROC

    if [ -n "$2" ]; then
        #user-set no. of process instead of default
        noofproc=$2
        echo "Max processes: $noofproc"
    fi

    export -f check_live
    cat $hosts | xargs -n 1 -P $noofproc -I{} bash -c 'check_live  {}' \; 2>/dev/null
}

function multi_process_health {
    T="$(date +%s%N)"

    DEFAULT_NO_OF_PROC=8
    noofproc=$DEFAULT_NO_OF_PROC

    if [ -n "$2" ]; then
        #user-set no. of process instead of default
        noofproc=$2
        echo "Max processes: $noofproc"
    fi

    export -f check_health
    awk '{print FNR " " $1}' < $hosts | xargs -n2 -P $noofproc -I{} bash -c 'check_health  {}' \; 2>/dev/null
}

multi_process_live
while true; do
    multi_process_health
    sleep 1
done

That runs the initial live check and then runs the multi_process_health function.

The key is two things. First the use of awk '{print FNR " " $1}' which turns a list of hosts:

127.0.0.1
10.0.0.1
192.168.0.1

Into this:

1 127.0.0.1
2 10.0.0.1
3 192.168.0.1

Second the use of xargs -n2 which makes it so each call to check_health is given the sequence number as the first parameter and the actual host as the second parameter.

Erwin
  • 844
  • 4
  • 14
  • The `$hosts` file is a file with IP `192.168.0.1 etc` one at a line as you guessed. I had no idea about the `awk '{print FNR " " $1}'` part but its a neat awesome trick to do it. The script works as intended with your corrections except the while loop of check_health, it constantly echos which hosts are down. If your run the `Script 2` by itself (with IPs you have on your network) the behaviour is that it echos all down hosts once and then echos only if it sees changes on any IP (UP or Down). I will check it once I go home and update my response. Thank you for your time, guidance and Input! – Tom Jun 15 '22 at 10:20
  • Quick question, I have searched `xargs -n` argument but what does `xargs -n2` exactly means. I mean the `n2` part? – Tom Jun 15 '22 at 10:31
  • https://www.man7.org/linux/man-pages/man1/xargs.1.html `-n max-args, --max-args=max-args Use at most max-args arguments per command line. Fewer than max-args arguments will be used if the size (see the -s option) is exceeded, unless the -x option is given, in which case xargs will exit.`It's really meant for sending "up to X" sections from the xargs input to the commands that xargs runs. I just make sure there is an even number of sections in the xargs input (the sequence and the IP address or hostname). – Erwin Jun 16 '22 at 18:04
0

If the number of hosts fit on your screen, then try this:

healthcheck() {
    ipnumber=$1
    last="-1"
    while true; do
        date=$(date -Is)
        ping -c 1 -t 20 $ipnumber > /dev/null 2>/dev/null
        latest=$?
        if [ $last -ne $latest ] ; then
            # status changed                                                                                                                        
            if [ $latest -eq 0 ] ; then
                echo -e "|\033[1;36m  $date \033[0m" "|\033[1;32m Node |"$ipnumber "| UP \033[0m" | column -t -s "|"
            else
                echo -e "|\033[1;36m  $date \033[0m" "|\033[1;31m Node |"$ipnumber "| DOWN \033[0m" | column -t -s "|"
            fi
        fi
        last=$latest
        sleep 1
    done
}
export -f healthcheck

cat hostlist | parallel-20220622 -j0 --ll healthcheck

If you do not have parallel-20220622 this might be enough:

cat hostlist | parallel -j0 --lb healthcheck
Ole Tange
  • 31,768
  • 5
  • 86
  • 104