1

I'm running OpenFOAM simulations on a cluster and they take days to finish. I am looking for a way to monitor the process and get some meaningful insights. What I can do for the moment is to watch the tail of the log file using

watch tail -n 15 log.log

From here I have also found a nice GnuPlot-grep script:

set logscale y
set title "Residuals"
set ylabel 'Residual'
set xlabel 'Iteration'
plot "< cat log.log | grep 'Solving for Ux'    | cut -d' ' -f9 | tr -d ','" title 'Ux'                  with lines,\
     "< cat log.log | grep 'Solving for Uy'    | cut -d' ' -f9 | tr -d ','" title 'Uy'                  with lines,\
     "< cat log.log | grep 'Solving for Uz'    | cut -d' ' -f9 | tr -d ','" title 'Uz'                  with lines,\
     "< cat log.log | grep 'Solving for omega' | cut -d' ' -f9 | tr -d ','" title 'omega'               with lines,\
     "< cat log.log | grep 'Solving for k'     | cut -d' ' -f9 | tr -d ','" title 'k'                   with lines,\
     "< cat log.log | grep 'Solving for p'     | cut -d' ' -f9 | tr -d ','" title 'p'                   with lines,\
     "< cat log.log | grep 'Courant Number'    | cut -d' ' -f9 | tr -d ','" title 'Courant Number mean' with lines,\
     "< cat log.log | grep 'Courant Number'    | cut -d' ' -f6 | tr -d ','" title 'Courant Number max'  with lines
pause 1
reread

which extract the information from the log.log file and if I add set term dumb somewhere on top it can plot in terminal. However, the plot is very crowded, it is ugly, it takes forever to show and it prints to the terminal sequentially, instead of updating the former one.

Searching the internet I see there are some nice python libraries, such as npyscreen/picotui, ncurses/blessed, Asciimatics, Urwid, Prompt Toolkit ... for creating TUI/TLIs. I was wondering if you could help me know how I can create a text based interface to show basic information and a plot of selected values versus time. I want to have a couple of panels. One to select the variable I want to plot for example Courant Number mean and on the other panel have a plot of showing that variable versus step time. and other to show the latest value of all variables in real time. What I have in mind should resemble urwind's graph.py example:

enter image description here

P.S. Since I have posted this:

  • Here I was introduced to the Termgraph a very interesting python library to get some graphing in the terminal.
  • I have posted this idea in the Urwid google group. you may follow the discussion here.
  • I have found out about the PyFoam's CaseBuilder which also uses Urwid. Also here I was informed about other attempts within PyFoam's project to get some nice TUI information from the solver.
Foad S. Farimani
  • 12,396
  • 15
  • 78
  • 193
  • 1
    Sticking with gnuplot, you could try to clear the screen before every refresh, for example by including a `print "\033c"` somewhere at the beginning of the script. The only reason I can think of why this script might be slow is that for every frame you process the entire log file 8 times. Most likely you can extract the required data in a single pass with some `awk` or `perl` magic. Can you post a snippet of the data file? – user8153 Aug 29 '18 at 20:16
  • @user8153 Thanks. the log file gets updated very fast. maybe there is a way to read only the updated parts? – Foad S. Farimani Aug 29 '18 at 21:15
  • 1
    One method I have used for such things is to have a piece of SYSV or POSIX shared memory that the calculating processes continuously update and then amy monitoring processes can simply attach and grab whatever statistics they want. Nowadays, I'd probably stuff the numbers I needed in Redis and let any monitoring processes help themselves. Redis has hashes, sets, sorted sets ideal for time series, queues and also you can give a TTL (time to-live) after which data get auto-deleted so the set doesn't grow too large. Just a thought. – Mark Setchell Aug 29 '18 at 21:44
  • @MarkSetchell Thanks for the comment. I'm not familiar with Redis. Would you please elaborate? maybe sharing some gists showing your code? – Foad S. Farimani Aug 29 '18 at 21:51
  • 1
    I don't have any gists to share, but here are some links https://www.tutorialspoint.com/redis/redis_overview.htm and https://redis.io/topics/data-types-intro and https://www.infoq.com/articles/redis-time-series – Mark Setchell Aug 29 '18 at 22:02
  • @MarkSetchell so basically Redis is some kind of database, which we can use to extract information from the log file and store there and then use it to update the plot data? why not using Python built in structures or maybe Pandas? – Foad S. Farimani Aug 29 '18 at 22:16
  • 1
    It’s a very fast, in-memory “data structure server” that serves up lists, strings, hashes, sets to clients calling from C/C++, PHP, Python, bash, Java. I was suggesting you add writes into Redis by your cluster software of key values and then monitor progress by making reads from Redis (in bash or Python) and maybe using matplotlib for plotting in realtime. – Mark Setchell Aug 29 '18 at 22:22
  • @MarkSetchell would you be so kind to provide a example of such? [This page](https://www.cfdsupport.com/OpenFOAM-Training-by-CFD-Support/node230.html) also suggests to extract the data on a diffrent process. maybe a bash script ran on a diffrent core. – Foad S. Farimani Aug 29 '18 at 22:24

1 Answers1

2

As described in the comments above, I made you some sample code. It is based on Redis and I am suggesting you run Redis on your cluster manager node which is presumably close to the nodes of your cluster and always up - so a good candidate for a statistics gathering service.

The sample code is a dummy job, written in Python, and a monitoring routine written in bash, but the job could just as easily be written in C/C++ and the monitoring routine in Perl - there are all sorts of bindings for Redis - don't get hung up on a language.

Even if you can't read Python, it is very easy to understand. There are 3 threads that run in parallel. One just updates a string in Redis with the total elapsed processing time. The other two update Redis lists with time series data - a synthesised triangular wave - one running at 5 Hz and the other at 1 Hz.

I used a Redis string where variables don't need to record a history and a Redis list where a history is needed. Other data structures are available.

In the code below, the only 3 interesting lines are:

# Connect to Redis server by IP address/name
r = redis.Redis(host='localhost', port=6379, db=0)

# Set a Redis string called 'processTime' to value `processsTime`
r.set('processTime', processTime)

# Push a value to left end of Redis list
r.lpush(RedisKeyName, value)

Here is the dummy job that is being monitored. Start reading where it says

######
# Main
######

Here is the code:

#!/usr/local/bin/python3

import redis
import _thread
import time
import os
import random

################################################################################
# Separate thread periodically updating the 'processTime' in Redis
################################################################################
def processTimeThread():
   """Calculate time since we started and update every so often in Redis"""
   start = time.time()
   while True:
      processTime = int(time.time() - start)
      r.set('processTime', processTime)
      time.sleep(0.2)

################################################################################
# Separate thread generating a times series and storing in Redis with the given
# name and update rate
################################################################################
def generateSeriesThread(RedisKeyName, interval):
   """Generate a saw-tooth time series and log to Redis"""
   # Delete any values from previous runs
   r.delete(RedisKeyName)
   value = 0
   inc = 1
   while True:
      # Generate next value and store in Redis
      value = value + inc
      r.lpush(RedisKeyName, value)
      if value == 0:
         inc = 1
      if value == 10:
         inc = -1
      time.sleep(interval)

################################################################################
# Main
################################################################################

# Connect to Redis on local host - but could just as easily be on another machine
r = redis.Redis(host='localhost', port=6379, db=0)

# Get start time of job in RFC2822 format
startTime=time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.gmtime())
# ... and set Redis string "startTime"
r.set('startTime',startTime)

# Get process id (pid)
pid=os.getpid()
# ... and set Redis string "pid""
r.set('pid',pid)

# Start some threads generating data
_thread.start_new_thread( processTimeThread, () )
_thread.start_new_thread( generateSeriesThread, ('seriesA', 0.2) )
_thread.start_new_thread( generateSeriesThread, ('seriesB', 1) )

# Hang around (with threads still running) till user presses a key
key = input("Press Return/Enter to stop.")

I then wrote a monitoring script in bash that connects to Redis, grabs the values and displays them on the terminal in a TUI (Text User Interface). You could equally use Python or Perl or PHP and equally write a graphical interface, or web-based interface.

#!/bin/bash

################################################################################
# drawGraph
################################################################################
drawGraph(){
   top=$1 ; shift
   data=( "$@" )
   for ((row=0;row<10;row++)) ; do
      ((y=10-row))
      ((screeny=top+row))
      line=""
      for ((col=0;col<30;col++)) ; do
         char=" "
         declare -i v
         v=${data[col]}
         [ $v -eq $y ] && char="X"
         line="${line}${char}"
      done
      printf "$(tput cup $screeny 0)%s" "${line}"
   done
}

# Save screen and clear and make cursor invisible
tput smcup
tput clear
tput civis

# Trap exit
trap 'exit 1' INT TERM
trap 'tput rmcup; tput clear' EXIT

while :; do
   # Get processid from Redis and display
   pid=$(redis-cli <<< "get pid")
   printf "$(tput cup 0 0)ProcessId: $pid"

   # Get process start time from Redis and display
   startTime=$(redis-cli <<< "get startTime")
   printf "$(tput cup 1 0)Start Time: $startTime"

   # Get process running time from Redis and display
   processTime=$(redis-cli <<< "get processTime")
   printf "$(tput cup 2 0)Running Time: $(tput el)$processTime"

   # Display seriesA last few values
   seriesA=( $(redis-cli <<< "lrange seriesA 0 30") )
   printf "$(tput cup 5 0)seriesA latest values: $(tput el)"
   printf "%d " "${seriesA[@]}"

   # Display seriesB last few values
   seriesB=( $(redis-cli <<< "lrange seriesB 0 30") )
   printf "$(tput cup 6 0)seriesB latest values: $(tput el)"
   printf "%d " "${seriesB[@]}"

   drawGraph 8  "${seriesA[@]}"
   drawGraph 19 "${seriesB[@]}"

   # Put cursor at bottom of screen and tell user how to quit
   printf "$(tput cup 30 0)Hit Ctrl-C to quit"
done

Hopefully you can see that you can grab data structures from Redis very easily. This gets the processTime variable set within the job on the cluster node:

processTime=$(redis-cli <<< "get processTime")

The TUI looks like this:

enter image description here

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432