As described in the comments above, I made you some sample code. It is based on Redis and I am suggesting you run Redis on your cluster manager node which is presumably close to the nodes of your cluster and always up - so a good candidate for a statistics gathering service.
The sample code is a dummy job, written in Python, and a monitoring routine written in bash
, but the job could just as easily be written in C/C++ and the monitoring routine in Perl - there are all sorts of bindings for Redis - don't get hung up on a language.
Even if you can't read Python, it is very easy to understand. There are 3 threads that run in parallel. One just updates a string
in Redis with the total elapsed processing time. The other two update Redis lists
with time series data - a synthesised triangular wave - one running at 5 Hz and the other at 1 Hz.
I used a Redis string where variables don't need to record a history and a Redis list where a history is needed. Other data structures are available.
In the code below, the only 3 interesting lines are:
# Connect to Redis server by IP address/name
r = redis.Redis(host='localhost', port=6379, db=0)
# Set a Redis string called 'processTime' to value `processsTime`
r.set('processTime', processTime)
# Push a value to left end of Redis list
r.lpush(RedisKeyName, value)
Here is the dummy job that is being monitored. Start reading where it says
######
# Main
######
Here is the code:
#!/usr/local/bin/python3
import redis
import _thread
import time
import os
import random
################################################################################
# Separate thread periodically updating the 'processTime' in Redis
################################################################################
def processTimeThread():
"""Calculate time since we started and update every so often in Redis"""
start = time.time()
while True:
processTime = int(time.time() - start)
r.set('processTime', processTime)
time.sleep(0.2)
################################################################################
# Separate thread generating a times series and storing in Redis with the given
# name and update rate
################################################################################
def generateSeriesThread(RedisKeyName, interval):
"""Generate a saw-tooth time series and log to Redis"""
# Delete any values from previous runs
r.delete(RedisKeyName)
value = 0
inc = 1
while True:
# Generate next value and store in Redis
value = value + inc
r.lpush(RedisKeyName, value)
if value == 0:
inc = 1
if value == 10:
inc = -1
time.sleep(interval)
################################################################################
# Main
################################################################################
# Connect to Redis on local host - but could just as easily be on another machine
r = redis.Redis(host='localhost', port=6379, db=0)
# Get start time of job in RFC2822 format
startTime=time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.gmtime())
# ... and set Redis string "startTime"
r.set('startTime',startTime)
# Get process id (pid)
pid=os.getpid()
# ... and set Redis string "pid""
r.set('pid',pid)
# Start some threads generating data
_thread.start_new_thread( processTimeThread, () )
_thread.start_new_thread( generateSeriesThread, ('seriesA', 0.2) )
_thread.start_new_thread( generateSeriesThread, ('seriesB', 1) )
# Hang around (with threads still running) till user presses a key
key = input("Press Return/Enter to stop.")
I then wrote a monitoring script in bash
that connects to Redis, grabs the values and displays them on the terminal in a TUI (Text User Interface). You could equally use Python or Perl or PHP and equally write a graphical interface, or web-based interface.
#!/bin/bash
################################################################################
# drawGraph
################################################################################
drawGraph(){
top=$1 ; shift
data=( "$@" )
for ((row=0;row<10;row++)) ; do
((y=10-row))
((screeny=top+row))
line=""
for ((col=0;col<30;col++)) ; do
char=" "
declare -i v
v=${data[col]}
[ $v -eq $y ] && char="X"
line="${line}${char}"
done
printf "$(tput cup $screeny 0)%s" "${line}"
done
}
# Save screen and clear and make cursor invisible
tput smcup
tput clear
tput civis
# Trap exit
trap 'exit 1' INT TERM
trap 'tput rmcup; tput clear' EXIT
while :; do
# Get processid from Redis and display
pid=$(redis-cli <<< "get pid")
printf "$(tput cup 0 0)ProcessId: $pid"
# Get process start time from Redis and display
startTime=$(redis-cli <<< "get startTime")
printf "$(tput cup 1 0)Start Time: $startTime"
# Get process running time from Redis and display
processTime=$(redis-cli <<< "get processTime")
printf "$(tput cup 2 0)Running Time: $(tput el)$processTime"
# Display seriesA last few values
seriesA=( $(redis-cli <<< "lrange seriesA 0 30") )
printf "$(tput cup 5 0)seriesA latest values: $(tput el)"
printf "%d " "${seriesA[@]}"
# Display seriesB last few values
seriesB=( $(redis-cli <<< "lrange seriesB 0 30") )
printf "$(tput cup 6 0)seriesB latest values: $(tput el)"
printf "%d " "${seriesB[@]}"
drawGraph 8 "${seriesA[@]}"
drawGraph 19 "${seriesB[@]}"
# Put cursor at bottom of screen and tell user how to quit
printf "$(tput cup 30 0)Hit Ctrl-C to quit"
done
Hopefully you can see that you can grab data structures from Redis very easily. This gets the processTime
variable set within the job on the cluster node:
processTime=$(redis-cli <<< "get processTime")
The TUI looks like this:
