Multi-threading program taking longer than single thread (Java)

Question

I have to write a program that reads all of the words from a file and determines how many times each word is used. I'm tasked to use multi-threading to speed the run time, but the single threaded program runs faster than the multi. I've tried researching a solution to this but a lot of the explanations are only confusing me more. I'm very new to using threads, and was wondering if anyone could point me in the right direction of fixing my code so that the overhead for creating the threads won't cause the program to run slower than the single thread.

public class Main 
{
final static int THREADS = 4;
static HashMap<String, Integer> map = new HashMap<>();
static List<String> file = new ArrayList<String>();
static String filename = "D:\\yes.txt";
static int count;




public static void main(String args[]) throws Exception
{
    long startTime = System.nanoTime();
    Monitor m = new Monitor();
    final Queue<String> dataQueue = new ConcurrentLinkedQueue<>();

    try ( Scanner in = new Scanner(new File(filename))) 
    {
                while ( in.hasNext() ) 
                {
                    dataQueue.add( in.next() );
                }
    }
    catch ( IOException e ) 
    {
        e.printStackTrace();
    }


    Thread T1 = new Thread( new WordCount(m, map, dataQueue ));
    Thread T2 = new Thread( new WordCount(m, map, dataQueue ));
    Thread T3 = new Thread( new WordCount(m, map, dataQueue ));
    Thread T4 = new Thread( new WordCount(m, map, dataQueue ));

    T1.start();
    T2.start();
    T3.start();
    T4.start();


     //wait for threads to end
       try {
       T1.join();
       T2.join();
       T3.join();
       T4.join();
    } catch ( Exception e) {
       System.out.println("Interrupted");
    }   

    Set<String> keys = map.keySet();
    for (String key : keys) 
    {
        System.out.println(key);
        System.out.println(map.get(key));
    }
    long endTime = System.nanoTime();
    System.out.println("Thread Took "+((endTime - startTime)/100000) + " ms");


}
}

public class WordCount implements Runnable
{

    private Monitor m;
    private Queue<String> dataQueue;
    private HashMap<String, Integer> map;

    public WordCount(Monitor m, HashMap<String, Integer> map,Queue<String> dataQueue)
    {
        this.m = m;
        this.dataQueue = dataQueue;
        this.map = map;
    }

    @Override public void run()
    {
        while ( !dataQueue.isEmpty() ) 
        {
            String line = dataQueue.poll();
            m.keySet(map, line);
        }
    }   
}

public class Monitor 
{
    public synchronized void keySet(HashMap<String, Integer> map, String word) 
    {
        String[] words = filterIllegalTokens(word );
        String[] lowerCaseWords = mapToLowerCase( words );
         for ( String s : lowerCaseWords ) {


        if (map.containsKey(s)) 
        {
            map.put(s, map.get(s) + 1);

        } 
        else 
        {
            map.put(s, 1);
        }
         }
    }
    public  String[] filterIllegalTokens(String words)
    {
        List<String> filteredList = new ArrayList<>();

        if ( words.matches( "[a-zA-Z]+" ) ) {
                filteredList.add( words );
            }

        return filteredList.toArray( new String[filteredList.size()] );
    }
    public  String[] mapToLowerCase( String[] words )
    {
        String[] filteredList = new String[words.length];
        for ( int i = 0; i < words.length; i++ ) {
            filteredList[i] = words[i].toLowerCase();
        }
        return filteredList;
    }
}

These are the three classes I have. Any tips or advice?

Your `while (!dataQueue.isEmpty())` loop is running constantly, which is using a great deal of CPU and is very inefficient. Use a [LinkedBlockingQueue](https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/util/concurrent/LinkedBlockingQueue.html) instead of a ConcurrentLinkedQueue, so you can do `while (true) { String line = dataQueue.take(); … }`, which will not execute any code unless and until the queue has an element available. — VGR, Feb 17 '20 at 15:34
How big is the file? Will there be tens of words? hundreds? Thousands? Tens of thousands? Concurrency will only start to prove itself when you process a large amount of data which takes a long time to process. — Gimby, Feb 17 '20 at 15:36

Stefan · Accepted Answer · 2020-02-17T16:23:13.243

A rule of thumbs says that you need one CPU core for the operating system, the others can be used for the program. So you need at least 5 CPU cores for optimal performance.

The overhead for creating these few threads does not really matter. That would become more relevant when you start dozens of threads withing milliseconds.

The main issue in your code is that you access data in a shared memory area for 90% of the total time. In this case we are talking about the ConcurrentLinkedQueue and the synchronized Monitor.keySet() method. While one thread access these objects, the other 3 threads must wait. When you run your program for a long time you might notice that only a fraction of your total CPU power is used.

To improve the performance I would recommend to split the job queue into 4 packets before you start the threads, so each thread can then process its own packet without waiting for other threads. Also each thread shall collect its result in an individual container. Then finally (after the threads are finished), you can combine the four results.

If your worker threads would be more complicated, your problem would be less hard. For example if the access to the containers would take only 10% of the overall time (while some calculation takes 90%) then the overhead of the thread synchronization would also be much less - relative to the total execution time.

Thanks for your input. I'm wondering, what do you mean by splitting the job queue into packets? I tried using executor service with the threads but this caused me to get different output every time. I'm not sure if that is what you mean by this. I also tried splitting the file up into 4 different strings and passing them into each thread but this also didn't improve speed. And I am testing with 40000 plus characters in the file. — Duck, Feb 17 '20 at 21:31
@Duck But what speed are you talking about? 40000 characters can be chewed through in under a second by modern computers. Try processing a file which is a hundred megabytes in size and then compare the total processing time. you'll likely see very different results which are not skewed by such things as VM warmup time and overhead. — Gimby, Feb 18 '20 at 10:40
Duck, If you split the input you did only half of the job. You also need to split the output so that each threads produces its own output. And you should replace the synchronized stuff by non-synchronized versions. However, with that simple the whole execution time will be dominated by disk I/O (for the input, output and class loading). You need much more data and much more complex algorithms in the threads to see a significant difference. — Stefan, Feb 18 '20 at 17:09

Multi-threading program taking longer than single thread (Java)

1 Answers1