1

So I have some shell scripts that extract and transform certain data in .tar.gz format. The extraction/transformation takes a large set of data and transforms it into individual files that can be used to populate a database. Because the set of data is very large, I need to use "GNU parallel" to run the process on multiple machines. These scripts serve as wrapper scripts for my Java application.

However, I was looking to integrate this process with my Java application in a more direct manner that allows for higher level testing. My first thought was to replace the shell scripts with Jython and call my Java methods and applications directly from Jython instead. However, I can't seem to find a program similar to GNU parallel that will allow me to run the script simultaneously across multiple machines.

Any thoughts/ideas?

1 Answers1

0

Not sure if this is precisely what you need, but have a look at Jsch: http://www.jcraft.com/jsch/examples/

Particularly the Exec.java and Shell.java examples:

http://www.jcraft.com/jsch/examples/Exec.java.html

http://www.jcraft.com/jsch/examples/Shell.java.html

  • Not exactly, I was thinking more along the lines of Parallel Python, except for Jython i.e. Parallel Jython. Jsch seems a little too verbose and too much manual configuration. Let me clarify. I need something that takes a list of machines, runs the extraction and transformation on a large set of data that's been manually divided into smaller sections, extract the data simultaneously on multiple computers, and sends the extracted data back to the master machine. Parallel does this job perfectly, but it's hard to integrate with Jython – nikhil.narula Jul 17 '12 at 14:14