1

I have multiple threads to call one method in writing contents from an object to file, as below: When I use 1 thread to test this method, the output into my file is expected. However, for multiple threads, the output into the file is messy. How to make this thread safe?

void (Document doc, BufferedWriter writer){
       Map<Sentence, Set<Matrix>> matrix = doc.getMatrix();
       for(Sentence sentence : matrix.keySet()){
           Set<Matrix> set = doc.getMatrix(sentence);
           for(Matrix matrix : set){
               List<Result> results = ResultGenerator.getResult();
               writer.write(matrix, matrix.frequency());
               writer.write(results.toString());
               writer.write("\n");
           }
       }
}

Edit:

I added this line List<Result> results = ResultGenerator.getResult(). What I really want is to use multiple threads to process this method call, since this part is expensive and takes a lot of time. The writing part is very quick, I don't really need multiple threads.

Given this change, is there a way to make this method call safe in concurrent environment?

guido
  • 18,864
  • 6
  • 70
  • 95
user697911
  • 10,043
  • 25
  • 95
  • 169

6 Answers6

2

Essentially, you are limited by single file at the end. There are no global variables and it publishes nothing, so the method is thread safe.

But, if processing does take a lot of time, you can use parallelstreams and publish the results to concurrenthashmap or a blocking queue. You would however still have a single consumer to write to the file.

Ramandeep Nanda
  • 519
  • 3
  • 9
1

If you need the final file in a predetermined sequential order, do not multithread, or you will not get what you expect.

If you think that with multithreading your program will execute faster in regards to I/O output, you are likely mistaken; because of locking or overhead due to synchronisation, you will actually get degraded performance than a single thread.

If you trying to write a very big file, the ordering of Document instances is not relevant, and you think your writer method will hit a CPU bottleneck instead (but the only possible cause I can figure out from our code is the frequency() method call), what you can do is having each thread hold its own BufferedWriter that writes to a temporary file, and then add an additional thread that waits for all, then generates the final file using concatenation.

guido
  • 18,864
  • 6
  • 70
  • 95
1

I am not well versed in Java so I am going to provide a language-agnostic answer.

What you want to do is to transform matrices into results, then format them as string and finally write them all into the stream.

Currently you are writing into the stream as soon as you process each result, so when you add multi threads to your logic you end up with racing conditions in your stream.

You already figured out that only the calls for ResultGenerator.getResult() should be done in parallel whilst the stream still need to be accessed sequentially.

Now you only need to put this in practice. Do it in order:

  • Build a list where each item is what you need to generate a result
  • Process this list in parallel thus generating all results (this is a map operation). Your list of items will become a list of results.
  • Now you already have your results so you can iterate over them sequentially to format and write them into the stream.

I suspect the Java 8 provides some tools to make everything in a functional-way, but as said I am not a Java guy so I cannot provide code samples. I hope this explanation will suffice.

@edit

This sample code in F# explains what I meant.

open System

// This is a pretty long and nasty operation!
let getResult doc =
    Threading.Thread.Sleep(1000)
    doc * 10

// This is writing into stdout, but it could be a stream...
let formatAndPrint =
    printfn "Got result: %O"

[<EntryPoint>]
let main argv =
    printfn "Starting..."

    [| 1 .. 10 |] // A list with some docs to be processed
    |> Array.Parallel.map getResult // Now that's doing the trick
    |> Array.iter formatAndPrint

    0
Gabriel
  • 1,922
  • 2
  • 19
  • 37
0

If your code is using distinct doc and writer objects, then your method is already thread-safe as it does not access and use instance variables.

If you are writing passing the same writer object to the method, you could use one of these approaches, depending on your needs:

void (Document doc, BufferedWriter writer){
       Map<Sentence, Set<Matrix>> matrix = doc.getMatrix();
       for(Sentence sentence : matrix.keySet()){
           Set<Matrix> set = doc.getMatrix(sentence);
           for(Matrix matrix : set){
               List<Result> results = ResultGenerator.getResult();

               // ensure that no other thread interferes while the following
               // three .write() statements are executed.
               synchronized(writer) {
                   writer.write(matrix, matrix.frequency()); // from your example, but I doubt it compiles
                   writer.write(results.toString());
                   writer.write("\n");
               }
           }
       }
}

or lock-free with using a temporary StringBuilder object:

void (Document doc, BufferedWriter writer){
       Map<Sentence, Set<Matrix>> matrix = doc.getMatrix();
       StringBuilder sb = new StringBuilder();
       for(Sentence sentence : matrix.keySet()){
           Set<Matrix> set = doc.getMatrix(sentence);
           for(Matrix matrix : set){
               List<Result> results = ResultGenerator.getResult();
               sb.append(matrix).append(matrix.frequency());
               sb.append(results.toString());
               sb.append("n");
           }
       }
       // write everything at once
       writer.write(sb.toString();
}
Seb T
  • 795
  • 8
  • 16
-2

I'd make it synchronized. In that case, only one thread in your application is allowed to call this method at the same time => No messy output. If you have multiple applications running, you should consider something like file locking.

Example for a synchronized method:

public synchronized void myMethod() {
    // ...
}

This method is exclusive for each thread.

Lukas Fink
  • 627
  • 6
  • 18
-2

You could lock down a method and then unlock it when you are finished with it. By putting synchronized before a method, you make sure only one thread at a time can execute it. Synchronizing slows down Java, so it should only be used when necessary.

ReentrantLock lock = new ReentrantLock();

 /* synchronized */ 
public void run(){

    lock.lock();

    System.out.print("Hello!");

    lock.unlock();

 }

This locks down the method just like synchronized. You can use it instead of synchronized, that's why synchronized is commented out above.

Sinil
  • 87
  • 10