2

I'm developing a small in house alternative to Tripwire, so I've coded a small script to hash files in a JBoss EAP server, and store the path and the hash in a MySQL database.

Every day the script compares the hashes in the filesystem with those saved in the DB, so any change is logged and finally reported using JasperServer.

The script runs at night using cron, to avoid a large number of scripts quering the DB at the same time it uses time.sleep(RANDOM_NUMBER_OF_SECONDS) before doing the fun stuff, but sometimes time.sleep seems to sleep forever and the script ends without any error, I check the mail cron sends and no error is logged. Any help would be appreciated. I'm Using jython-standalone-2.5.3, IBM's JDK and RHEL 5.6 running inside VMWare.

I just found http://bugs.jython.org/issue1974 and a code comment seems to point that OS signals can cause this behavior, but not sure if this is my case.

If you want to see the code checkout at http://code.google.com/p/pysnapshot/

Luis García Bustos.

Joshua
  • 8,112
  • 3
  • 35
  • 40

1 Answers1

0

I don't know why do you think time.sleep() can make less number of scripts querying the DB.

IMO ot is better to use cron to call that program periodically. After it is started it should check if in /tmp/ directory is "semaphore" file, for example /tmp/snapshot_working.txt. If there is no semaphore file, then create it and write to it something like: "snapshot started: 2012-12-05 22:00:00". After your program completes checking it should remove this file. If at start program will find semaphore file then it could just stop or check if date & time saved in this file looks "old". If it is "old", then remove it and start normally writing in log that "old" file was found (administrator can find such long working snaphots and terminate it).

The only reason do make time.sleep() in your case is if you want to use such script at normal working hours without making Denial Of Service attack to your DB. Example: after making 100 DB queries you can make little sleep and give DB time to serve other user queries. But I think the sooner program finishes the better.

Michał Niklas
  • 53,067
  • 18
  • 70
  • 114
  • Hi! you're right, I use time.sleep because MySQL can't handle the load, running in 100+ servers, since I'm starting all the scripts at the same time (at the batch window) and using transactions - because a "partial snapshot" is useless for integrity checking purposes - MySQL simply can't handle it and transactions start to abort, so I use a kind of "exponential backoff" to recover. The main problem is when I put the script to wait (using time.sleep) before retrying an operation, it seems to never wake up. I'm not using threading at all. – user1877237 Dec 05 '12 at 14:19
  • How many scripts do you start? Do they use the same MySQL server? Do you check one filesystem? Why not start one script to check one directory after another (long command line or configuration file)? – Michał Niklas Dec 05 '12 at 20:39