-1

I have written a Java program to scrap through huge log files. For parallel processing of files, I am using thread concept. Below is the source code for the same.

ExecutorService threadPool = Executors.newFixedThreadPool(8);
        for(int i=0; i < files.size(); i++)
        {
            threadPool.execute(new ProcessInThreads(i+"`"+files.get(i),fr) {
                 public void run() 
                 {
                    long threadId = Thread.currentThread().getId();
                    Initiate(fr,threadId);

                 }
             });
        }
        threadPool.shutdown();

When the files.size()=300, the program completes execution in less than minutes, but when the files.size() increases, the performance degrades. What could be reason?? How to overcome the same.

Here files is an array of the filenames to be processed, populating this array is taking less than 10 seconds.

ProcessInThreads is a class which implements Runnable interface. if the loop runs for 500 files, will 500 instance of ProcessInThreads be created. How can I kill/release the instance after every execution?

Madhu Velayudhan
  • 59
  • 1
  • 2
  • 8
  • files.size() is the api which tries to calculate no of files in your file system/directory. Obviously it takes time to calculate. Try to use some filter – Shriram Mar 30 '17 at 05:59
  • 3
    I think you missed to complete a particular line here: "but when the files.size(), the performance degrades". what do you mean files.size is how many here? – Arindam Mar 30 '17 at 06:00
  • The code you provided will have 8 threads running all the time executing tasks given. How many files are given when you think the performance becomes bad? – Everyone Mar 30 '17 at 06:03
  • 2
    Seems to be a simple case of "more work to do" -> "takes more time". – Henry Mar 30 '17 at 06:05
  • when the number of files is more than 400. will the 8 threads release the resources every run, or keep accumulating the resource till the end of execution?? – Madhu Velayudhan Mar 30 '17 at 06:07
  • Depends on what you're actually doing. Try being more thorough describing what you are observing? – muzzlator Mar 30 '17 at 06:09
  • if the resources are getting accumulated, is there a way to release the resources after every iteration. – Madhu Velayudhan Mar 30 '17 at 06:09
  • 1
    Your question is still vague but I'm going to guess and say: Whatever these resources are, release them after or in your call to `Initiate`. If `Initiate` is some sort of asynchronous function, use a callback to release the resource after you are done. – muzzlator Mar 30 '17 at 06:10
  • how do i use a callback to release the resouce? – Madhu Velayudhan Mar 30 '17 at 06:13
  • Are you sure that's what you need? What's stopping you from releasing the resource after the call to `Initiate` in your runnable? You haven't really given enough details to know what you're actually doing – muzzlator Mar 30 '17 at 06:53

1 Answers1

2

You are using a fixed size thread pool, it will only be running 8 threads at a time. If you add more input files, you should expect it to take longer to go through all of them.

That being said, don't expect much better performance by throwing more threads at the problem. It depends on many other factors like the nature of the file read, etc.

muzzlator
  • 742
  • 4
  • 10