0


I have a cluster of 4 servers. A file consists of many logical documents. Each file is started as a workflow. So, in summary a workflow runs on the server for each physical input file which can contain as many as 3,00,000 logical documents. At any given time 80 workflows are running concurrently across the cluster. Is there a way to speed up the file processing? Is file splitting a good alternative ? Any suggestions? Everything is java based running on a tomcat servlet engine.

phoenix
  • 3,531
  • 9
  • 30
  • 31
  • 1
    Maybe use something like [Hadoop](http://en.wikipedia.org/wiki/Hadoop) – Yet Another Geek May 18 '11 at 16:35
  • It's not clear whether 1 physical input file = 1 logical file or not. You've talked about a "file" having lots of documents, and a physical input file having lots of documents, but there's no clarity as to what the relationship between physical input files and logical files is. – Jon Skeet May 18 '11 at 16:36
  • A physical file(file that actually resides on disk) can contain up to 2,00,000 logical documents. A logical document is what my application identifies as a single document. It is just that multiple documents are packaged together into a bigger file which is the physical file. A workflow operates on the physical file as a whole while my application processes one logical document at a time. – phoenix May 18 '11 at 17:26
  • Tuning this is going to depending on what your app actually does. Try and time the different parts of your process and identify which parts take the longest. Then you can break down the parts that take the longest to see if there is any way to make them more efficient. – Karthik Ramachandran May 18 '11 at 22:00

1 Answers1

0

Try to process the files in Oracle Coherence. This gives grid processing. Coherence also provides data persistence as well.

Sreedhar
  • 1
  • 1