4

I have a directory where a lot of files are saved dynamically. Currently there is a task which lists the files from time to time and processes them sequentially (writing to a database). Due to the increasing number of files it is necessary to implement parallel processing of these files. Can you give me some ideas and a code example in java, please?

Rob Hruska
  • 118,520
  • 32
  • 167
  • 192
amandina
  • 51
  • 2
  • 3

5 Answers5

3

Use an ExecutorService. Create a Executors.newFixedThreadExecutor(n); you can probably make the file processing into a single runnable (or callable) task and have it pass in a File that you can work on

ExecutorService service = Executors.newFixedThreadExecutor(10);

for(final File file : directory.listFiles()){
   service.submit(new Runnable(){
        public void run(){
             //do work here on file object
        }
   });
}
John Vint
  • 39,695
  • 7
  • 78
  • 108
1

Take a look at the Watch Servie API in java.nio.file. Here's documentation and a tutorial: http://download.oracle.com/javase/tutorial/essential/io/notification.html

This service lets you register for file notification changes on a directory. For every notification you can do whatever processing you want. Probably a lot easier than implementing your own thing.

Simeon G
  • 1,188
  • 8
  • 20
0

create a class saver extends Thread and handle the file manipulation there ( in run() method)?


http://download.oracle.com/javase/tutorial/essential/concurrency/

http://download.oracle.com/javase/7/docs/api/java/lang/Thread.html

n00b
  • 5,642
  • 2
  • 30
  • 48
0

It's not really obvious if you're familiar with concurrency in Java, so I'd start by taking a look at the the Java Concurrency Tutorial. It's a good place to start.

Then keep in mind that any object that needs to be accessed by multiple threads should be immutable or synchronized.

Following that you can have a thread pool using an ExecutorService and have a number of threads run simultaneously.

I know that it's not the same process essentially but assuming you know how to handle the files, you can take a look at the following questions about multithreading in different context: questions around synchronization in java; when/how/to what extent

Parallel-processing in Java; advice needed i.e. on Runnanble/Callable interfaces

Community
  • 1
  • 1
posdef
  • 6,498
  • 11
  • 46
  • 94
0

If I understand correctly your single task processing from reading to loading in DB. You can break this task into different task based on the nature (DB centric, CPU centric or IO centric). For example you can have different tasks as follows

  1. Current task which picks the file from the directory and pass it to next task.

  2. IO Centric - New task to read the file and store in memory then pass to next taks.

  3. DB centric - New task to load the data from memory to database and then clean the memory.

  4. IO centric - move the file to some other place.

To further improve the performance you can implement task 2, 3, 4 using thread pool.This will allow to process many file parallely. Based on the complexity of the task you can add or remove any task from the list to suit your requirement.

Amit
  • 281
  • 1
  • 6
  • 16
  • pretty good suggestion except for step 2. reading an entire file into memory is (almost) _never_ a good idea. combine step 2/3 into "stream data from file to db". – jtahlborn Mar 09 '11 at 16:38
  • I agree on memory part but that is keeping in mind the location of the file. If a file is in a remote location then either make a local copy or keep it in memory (if it is sure that the file size will not cause any harm). And also the application can have some logic to keep the memory usage in limit. – Amit Mar 09 '11 at 16:42