2

I am using IBM LSF and trying to get usage statistics during a certain period. I found that bhist does the job, but the short form bhist output does not show all of the fields I need.

What I want to know is:

  1. Is bhist's output field customizable? The fields I need are:

    • <jobid>
    • <user>
    • <queue>
    • <job_name>
    • <project_name>
    • <job_description>
    • <submission_time>
    • <pending_time>
    • <run_time>
  2. If 1 is not possible, the long form (bhist -l) output shows everything I need, but the format is hard to manipulate. I've pasted an example of the format below.

For example, the number of line between records is not fixed, and the word wrap in each event may break the line in the middle of a word I'm trying to scan for. How do I parse this format with sed and awk?

JobId <1531>, User <user1>, Project <default>, Command< example200>
Fri Dec 27 13:04:14: Submitted from host <hostA> to Queue <priority>, CWD <$H
                     OME>, Specified Hosts <hostD>;
Fri Dec 27 13:04:19: Dispatched to <hostD>;
Fri Dec 27 13:04:19: Starting (Pid 8920);
Fri Dec 27 13:04:20: Running with execution home </home/user1>, Execution CWD 
                     </home/user1>, Execution Pid <8920>;
Fri Dec 27 13:05:49: Suspended by the user or administrator;
Fri Dec 27 13:05:56: Suspended: Waiting for re-scheduling after being resumed 
                     by user;
Fri Dec 27 13:05:57: Running;
Fri Dec 27 13:07:52: Done successfully. The CPU time used is 28.3 seconds.

Summary of time in seconds spent in various states by Sat Dec 27 13:07:52 1997
PEND  PSUSP  RUN  USUSP  SSUSP  UNKWN  TOTAL
5     0      205  7      1      0      218
------------------------------------------------------------
    .... repeat
Squirrel
  • 2,262
  • 2
  • 18
  • 29
Seong
  • 556
  • 4
  • 18
  • 2
    As is, it is a bit broad. Try to [edit] to show what you tried, what you want and what problems you are facing. – fedorqui Dec 16 '15 at 14:10
  • 1
    The right solution will be awk (assuming no existing application-oriented tool). sed is for simple substitutions on individual lines, that is all - remember that, no matter what wacky combinations of characters people munge into a sed command line and throw at you. Maybe you didn't understand an important part of @fedorquis comment - you must at a minimum show us the expected output given that input for us to stand a chance of understanding your requirements. – Ed Morton Dec 16 '15 at 16:32

2 Answers2

1

Long form output is pretty hard to parse. I know bjobs has an option for unformatted output (-UF) in older LSF versions which makes it a bit easier, and the most recent version of LSF allows you to customize which columns get printed in short form output with -o.

Unfortunately, neither of these options are available with bhist. The only real possibilities for historical information are:

  1. Figure out some way to parse bhist -l -- impractical and maybe not even possible due to inconsistent formatting as you've discovered.
  2. Write a C program to do what you want using the LSF API, which exposes the functions that bhist itself uses to parse the lsb.events file. This is the file that stores all the historical information about the LSF cluster, and is what bhist reads to generate its ouptut.
  3. If C is not an option for you, then you could try writing a script to parse the lsb.events file directly -- the format is documented in the configuration reference. This is hard, but not impossible. Here is the relevant document for LSF 9.1.3.

My personal recommendation would be #2 -- the function you're looking for is lsb_geteventrec(). You'd basically read each line in lsb.events one at a time and pull out the information you need.

Squirrel
  • 2,262
  • 2
  • 18
  • 29
1

I'm adding a second answer because it might help you with your problem without actually having to write your own solution (depending on the usage statistics you're after).

LSF already has a utility called bacct that computes and prints out various usage statistics about historical LSF jobs filtered by various criteria.

For example, to get summary usage statistics about jobs that were dispatched/completed/submitted between time0 and time1, you can use (respectively):

bacct -D time0,time1
bacct -C time0,time1
bacct -S time0,time1

Statistics about jobs submitted by a particular user:

bacct -u <username>

Statistics about jobs submitted to a particular queue:

bacct -q <queuename>

These options can be combined as well, so for example if you wanted statistics about jobs that were submitted and completed within a particular time window for a particular project, you can use:

bacct -S time0,time1 -C time0,time1 -P <projectname>

The output provides some summary information about all jobs that match the provided criteria like so:

$ bacct -u bobbafett -q normal

Accounting information about jobs that are: 
  - submitted by users bobbafett, 
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to queues normal, 
  - accounted on all service classes.
------------------------------------------------------------------------------

SUMMARY:      ( time unit: second ) 
 Total number of done jobs:       0      Total number of exited jobs:    32
 Total CPU time consumed:      46.8      Average CPU time consumed:     1.5
 Maximum CPU time of a job:     9.0      Minimum CPU time of a job:     0.0
 Total wait time in queues: 18680.0
 Average wait time in queue:  583.8
 Maximum wait time in queue: 5507.0      Minimum wait time in queue:    0.0
 Average turnaround time:     11568 (seconds/job)
 Maximum turnaround time:     43294      Minimum turnaround time:        40
 Average hog factor of a job:  0.00 ( cpu time / turnaround time )
 Maximum hog factor of a job:  0.02      Minimum hog factor of a job:  0.00
 Total Run time consumed:    351504      Average Run time consumed:   10984
 Maximum Run time of a job: 1844674      Minimum Run time of a job:       0
 Total throughput:             0.24 (jobs/hour)  during  160.32 hours
 Beginning time:       Nov 11 17:55      Ending time:          Nov 18 10:14

This command also has a long form output that provides some bhist -l-like information about each job that might be a bit easier to parse (although still not all that easy):

$ bacct -l -u bobbafett -q normal

Accounting information about jobs that are: 
  - submitted by users bobbafett, 
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to queues normal, 
  - accounted on all service classes.
------------------------------------------------------------------------------

Job <101>, User <bobbafett>, Project <default>, Status <EXIT>, Queue <normal>, 
                     Command <sleep 100000000>
Wed Nov 11 17:37:45: Submitted from host <endor>, CWD <$HOME>;
Wed Nov 11 17:55:05: Completed <exit>; TERM_OWNER: job killed by owner.

Accounting information about this job:
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
      0.00     1040           1040     exit         0.0000     0M      0M
------------------------------------------------------------------------------
...
Squirrel
  • 2,262
  • 2
  • 18
  • 29