Reading and processing a file in two separate threads works twice slower than one thread

Question

I solve a task of counting unique lines in a text file. Each string is one valid ip-address. The file can be of any size (literally, hundreds and thousands of gigabytes are possible). I wrote a simple class that implements a bit array and using it for counting.

public class IntArrayBitCounter {
    public static final long MIN_BIT_CAPACITY = 1L;
    public static final long MAX_BIT_CAPACITY = 1L << 32;

    private final int intArraySize;
    private final int[] intArray;
    private long counter;

    public IntArrayBitCounter(long bitCapacity) {
        if (bitCapacity < MIN_BIT_CAPACITY || bitCapacity > MAX_BIT_CAPACITY) {
            throw new IllegalArgumentException("Capacity must be in range [1.." + MAX_BIT_CAPACITY + "].");
        }
        this.intArraySize = 1 + (int) ((bitCapacity - 1) >> 5);
        this.intArray = new int[intArraySize];
    }

    private void checkBounds(long bitIndex) {
        if (bitIndex < 0 || bitIndex > ((long) intArraySize << 5)) {
            throw new IndexOutOfBoundsException("Bit index must be in range [0.." + (MAX_BIT_CAPACITY - 1) + "].");
        }
    }

    public void setBit(long bitIndex) {
        checkBounds(bitIndex);
        int index = (int) (bitIndex >> 5);
        int bit = 1 << (bitIndex & 31);
        if ((intArray[index] & bit) == 0) {
            counter++;
            intArray[index] |= bit;
        }
    }

    public boolean isBitSets(long bitIndex) {
        checkBounds(bitIndex);
        int index = (int) (bitIndex >> 5);
        int bit = 1 << (bitIndex & 31);
        return (intArray[index] & bit) != 0;
    }

    public int getIntArraySize() {
        return intArraySize;
    }

    public long getBitCapacity() {
        return (long) intArraySize << 5;
    }

    public long getCounter() {
        return counter;
    }
}

My simple single-threaded approach works well enough. It almost completely utilizes the reading speed of my old HDD which is approximately 130-135 MB/s. The System Monitor in Linux shows the reading from the disk to my program about 100-110 MB/s.

public class IpCounterApp {

    private static long toLongValue(String ipString) throws UnknownHostException {
        long result = 0;
        for (byte b : InetAddress.getByName(ipString).getAddress())
            result = (result << 8) | (b & 255);
        return result;
    }

    public static void main(String[] args) {
        long startTime = System.nanoTime();

        String fileName = "src/test/resources/test.txt";
        var counter = new IntArrayBitCounter(1L << 32);
        long linesProcessed = 0;
        try (BufferedReader reader = Files.newBufferedReader(Paths.get(fileName))) {
            String line;
            while ((line = reader.readLine()) != null) {
                counter.setBit(toLongValue(line));
                linesProcessed++;
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

        System.out.printf("%d unique lines in %d processed\n", counter.getCounter(), linesProcessed);
        long elapsedTime = System.nanoTime() - startTime;
        System.out.println("duration: " + elapsedTime / 1000000 + " milliseconds");
    }
}

Then I tried to start reading from the disk and processing rows in two different threads in the hope of a slight improvement. I created a blocking queue. The first thread reads the lines and writes in this queue. The second thread reads out of the queue and makes counting. However, the speed of execution on the test file in 10_000_000 addresses of which 5_000_000 uniquely collapsed almost 2 times. The read speed also fell by half to 50-55 MB / s.

public class ConcurrentIpCounterApp {

    public static void main(String[] args) {
        long startTime = System.nanoTime();

        String fileName = "src/test/resources/test.txt";
        var stringsQueue = new ArrayBlockingQueue<String>(1024);
        var reader = new BlockingQueueFileReader(stringsQueue, fileName);
        var counter = new BlockingQueueCounter(stringsQueue);

        ExecutorService executorService = Executors.newFixedThreadPool(2);
        Future<Long> linesProcessed = executorService.submit(reader);
        Future<Long> uniqueLines = executorService.submit(counter);

        try {
            System.out.printf("%d unique lines in %d processed\n", uniqueLines.get(), linesProcessed.get());
        } catch (InterruptedException | ExecutionException e) {
            e.printStackTrace();
        } finally {
            executorService.shutdown();
        }

        long elapsedTime = System.nanoTime() - startTime;
        System.out.println("duration: " + elapsedTime / 1000000 + " milliseconds");
    }
}

public class BlockingQueueCounter implements Callable<Long> {

    private final BlockingQueue<String> queue;
    private final IntArrayBitCounter counter;

    public BlockingQueueCounter(BlockingQueue<String> queue) {
        this.queue = queue;
        this.counter = new IntArrayBitCounter(1L << 32);
    }

    private static long toLongValue(String ipString) throws UnknownHostException {
        long result = 0;
        for (byte b : InetAddress.getByName(ipString).getAddress())
            result = (result << 8) | (b & 255);
        return result;
    }
    
    @Override
    public Long call() {
        String line;
        while (true) {
            try {
                line = queue.take();
                if ("EOF".equals(line)) {
                    break;
                }
                counter.setBit(toLongValue(line));
            } catch (InterruptedException | UnknownHostException e) {
                e.printStackTrace();
            }
        }
        return counter.getCounter();
    }
}

public class BlockingQueueFileReader implements Callable<Long> {

    private final BlockingQueue<String> queue;
    private final String fileName;
    private long totalLines;

    public BlockingQueueFileReader(BlockingQueue<String> queue, String fileName) {
        this.queue = queue;
        this.fileName = fileName;
    }

    @Override
    public Long call() {
        try (BufferedReader reader = Files.newBufferedReader(Paths.get(fileName))) {
            String line;
            while ((line = reader.readLine()) != null) {
                queue.put(line);
                totalLines++;
            }
            queue.add("EOF");
        } catch (IOException | InterruptedException e) {
            e.printStackTrace();
        }
        return totalLines;
    }
}

Please help me understand why this happens. I could not find the answer myself.

@user15358848 I'm not trying to read in two threads. I read in one thread and counting in the second. — chptr-one, Apr 18 '21 at 09:05
@chptr-one: but your code still spent more time in reading than in counting. And since the single thread has to wait for more data to arrive (i.e. the bottleneck is not CPU speed), spreading any work to multiple threads doesn't actually speed up stuff. In other words: if your HDD takes X time to completely read the file and one CPU can count the lines in less than X time, then splitting up the counting won't help (because you can count while the HDD continues reading). And since counting lines is very simple and HDDs are slow, that will always remain true. — Joachim Sauer, Apr 18 '21 at 09:11
You can't be any faster than the 130-135 MB/s you're reading from the HDD, even when you do IO and CPU in two separate threads. Looking at your code I think your blocking queue and its brittle 1024 limit causes contention and actually increases your HDD read latency overall. Get yourself a good profiler and look where half of your time went (eg by analyzing blocked threads). — Thomas Jungblut, Apr 18 '21 at 09:12
You can check what is real upper limit of reading by skipping processing. Remove all processing and test how fast actually you app can read. There is a chance that you already at max speed with single tread. When you use two thread you spend time in synchronization. You can reduce overhead by submitting data to queue in batches. — talex, Apr 18 '21 at 09:18
What made you think that adding more context-switching and parallel processing would be faster? — user207421, Apr 18 '21 at 10:11
@user207421 I see that my application does not utilize all possible reading speed. The disc is loaded at about 80 percent. I checked the reading speed from Java and see that I could achieve read speed up to 135 MB/s. With the size of the file in hundreds of gigabytes, this acceleration is very significant. I just start learning programming and do not really understand how to speed up my application to use all the speed of the disk. Also, I consider the possibility of working with rapid SSD and see that my hashfunction will be a bottle neck. — chptr-one, Apr 20 '21 at 14:48
@Joachim Apparently the disc is not a bottleneck. I can simply read the lines from the text file (counting the number of lines for example) and get the speed about 135 MB/s. My hash function is a bottleneck. — chptr-one, Apr 20 '21 at 14:54

score 1 · Accepted Answer · answered Apr 20 '21 at 16:15

1

To answer the question why the multithreaded attempt works two times slower than singlethreaded,try to measure

elapsed time for the whole process (you are doing that already)
producer active time (reading from disk and formatting data for queue)
producer queue waiting time (the time to actually stuff the data into the queue which eventually blocks)

I think that's where you get your answer.

answered Apr 20 '21 at 16:15

Queeg

7,748
1
16
42

As often happens, I measured not what actually should be measured. Now i used a profiler and jmh and found out what is actually a bottle neck. Thank you, now I can move on with eyes open. – chptr-one Apr 21 '21 at 10:38

score 0 · Answer 2 · answered Apr 19 '21 at 13:37

0

Is it possible that the blocking queue not only blocks the consumer but also the sender as soon as a chunk of data was enqueued? In this case your reading thread has to pause, and maybe initiating the next read operation means to wait until the hard drive has completed the next rotation.

What performance do you get if you increase the blocking queue's size?

So you'd have to ensure the reader is never paused. If the queue grows too big, increase the number of consuming threads.

answered Apr 19 '21 at 13:37

Queeg

7,748
1
16
42

Performance practically does not change from the size of the queue. And the queue should not block the consumer if it is not empty, this is not normal behavior. – chptr-one Apr 20 '21 at 15:05
From https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/concurrent/BlockingQueue.html: A Queue that additionally supports operations that wait for the queue to become non-empty when retrieving an element, and wait for space to become available in the queue when storing an element. => so be aware there are waiting times per design. All I pointed out is to ensure they are not on the reader side. – Queeg Apr 20 '21 at 15:25
yes, it's not on the consumer side, i checked it. My queue is almost always full. – chptr-one Apr 20 '21 at 15:38
1

Which is THE CONFIRMATION that the producer is made waiting. You want to consume at least at the speed of the producer, if not faster. So either increase queue size or add consumers. – Queeg Apr 20 '21 at 15:55

Reading and processing a file in two separate threads works twice slower than one thread

2 Answers2