So I have some shell scripts that extract and transform certain data in .tar.gz format. The extraction/transformation takes a large set of data and transforms it into individual files that can be used to populate a database. Because the set of data is very large, I need to use "GNU parallel" to run the process on multiple machines. These scripts serve as wrapper scripts for my Java application.
However, I was looking to integrate this process with my Java application in a more direct manner that allows for higher level testing. My first thought was to replace the shell scripts with Jython and call my Java methods and applications directly from Jython instead. However, I can't seem to find a program similar to GNU parallel that will allow me to run the script simultaneously across multiple machines.
Any thoughts/ideas?