13

i am developing an application that gathers a list with all the files of the hard drive and also afterwards it does write files to the hard drive.

I want to ask : what is the optimum number of concurrent threads that will do this task ?

I mean how many threads should i have that read the hard drive without making the hard drive to get slow because so many threads are reading it concurrently.

Thank you !

alexandertr
  • 943
  • 1
  • 9
  • 18
  • 1
    any specific reason that this process has to be multithreaded? – Aravind Mar 16 '11 at 06:34
  • https://serverfault.com/questions/826163/does-having-multiple-partitions-on-one-disk-parallelize-writing-to-disk/826167#comment1140243_826167 – Pacerier Nov 20 '17 at 18:03

7 Answers7

7

At first, I say one!

It actually depends whether the data to read need complex computations for being elaborated. In this case, it could be convenient to instantiate more than one thread to elaborate different disk data; but this is convenient only if you have a multiple CPU on the same system.

Otherwise, more than one thread make the HDD more stressed than necessary: concurrent reads from different threads will issue seek operations for reading the file blocks(*), introducing an overhead which could slow down the system, depending on the number of files read and the size of the files.

Read the files sequentially.

(*) The OS really tries to store the same file blocks sequentially in order to speed up the read operations. Disk fragmentation happens, so non-sequential fragments requires a seek operation which required really more time respect the read operation in the same place. Try to read multiple files in parallel, will cause a bunch of seeks because single file blocks are contiguous, while multiple files blocks could be not contiguous.

Luca
  • 11,646
  • 11
  • 70
  • 125
  • 1
    wow thank all of you for your answers. it is my first question on stackoverflow and i am impressed. – alexandertr Mar 17 '11 at 12:35
  • 1
    While the most of answers say one operation per disk, I want to add that with current **SSD** you could use more than one operation at same time without impacting performance of IO read/write. – hdkrus Apr 08 '17 at 15:34
4

One thread. If you are reading AND writing at the same time AND your destination is a disk different from your source, then 2 threads. I'll add that if you are doing other operations to the files (for example decompress) the decompress part can be done on a third thread.

To make some examples (I'm ignoring Junctions, Reparse Points...)

  • C: to C: 1 Thread TOTAL
  • C: to D: same physical disk, different partitions: 1 Thread TOTAL
  • C: to D: different physical disk: 2 Thread TOTAL

I'm working on the presumption that a Disk can do ONE operation at a time, and each time it "multitasks" switching between different reads/writes it loses in speed. Mechanical disks have this problem (but technically NCQ COULD help). Solid state disks I don't know (but I know that USB sticks are VERY slow if you try to do 2 operations at a time)

I have searched how you do it... I haven't found any "specific" examples, but I have some links to Windows API where you could start:

xanatos
  • 109,618
  • 12
  • 197
  • 280
  • does this extrapolate? if I'm reading 10 files and writing 10 files at the same time, what should be the number of threads? – Sanjeevakumar Hiremath Mar 16 '11 at 06:57
  • @Sanjeevakumar Let's say you copy from C: to C: (without considering Junctions...), 1 thread TOTAL. You copy from C: to D:, but on the same disk (2 partitions), 1 thread. You copy from C: to D:, TWO physical disks: 2 threads. – xanatos Mar 16 '11 at 07:40
  • @xanatos I read this answer http://stackoverflow.com/questions/38973929/how-can-i-achieve-parallelism-in-a-program-that-is-writing-to-the-disk-in-c and it says that If I read from a buffer in main memory and write to disk(See question for details) I can do the writes in parallel, so I get better performance if I use multithreading(more that 2 threads on 4 core). But if I understood right you say the opposite. I'm on Windows, can u recommend a source that explains how writes to disk happened. Does the OS really use only one thread to write to disk :O ? Thanks – lads Aug 18 '16 at 11:43
  • 2
    @lads The response you quoted was about a very basic C++ question. In one of the comments he wrote *Also, spinning-disc hard-drives are getting rarer by the day, an SSD don't have the problems with positioning* that is exactly the exception I gave 5 years ago: *Solid state disks I don't know* – xanatos Aug 20 '16 at 09:59
3

Never process IO-dense operations concurrently. It's slower because the disk probe wastes a lot of time on switching between different threads/files.

What shall I do if I have a few threads within IO operations? Produce the operations concurrently, and execute them single-threaded. We have a container, like a ConcurrentQueue<T>(or a thread-safe queue written by yourself), and there are 10 threads, will read from these files 1.txt 2.txt ... 10.txt. You put the "reading-requests" in the queue concurrently, another thread deals with all the requests(open 1.txt, get what you want, and continue with 2.txt), the disk probe will not be busy with switching between threads/files in this case.

Cheng Chen
  • 42,509
  • 16
  • 113
  • 174
2

I would say one thread is enough. The CPU might be able to run many threads, but the speed of the hard drive is many orders of magnitude below the CPU's. Even if running more threads made the requests for I/O faster (of which I'm not certain), it wouldn't make the hard drive actually read faster. It could probably even slow it down.

Kenji Kina
  • 2,402
  • 3
  • 22
  • 39
2

If it's coming off a single HDD, then you want to minimise seek times. So only use one thread for reading from and writing to disk.

OJ.
  • 28,944
  • 5
  • 56
  • 71
2

Many of the answers refer to the amount of HDDs. Keep in mind that it also depends on the number of controllers. Sometimes two HDDs are managed by a single controller. Also: two partitions on the same HDD are not two HDDs!

Emond
  • 50,210
  • 11
  • 84
  • 115
2

As the "C#" tag implies, I am assuming you are writing a managed application to perform disk I/O.

In this case, I am guessing the number of user-level managed threads are irrelevant as they are not the one actually performing disk I/O.

As far as I know, Disk I/O requests from the user-level managed threads will be queued in the kernel level APC queue and windows I/O threads will handle them.

So, I would say the frequency of disk I/O requests to be queued in APC queue will be more relevant to your question.

I have not seen any .NET threading API that allows binding any user tasks to Windows I/O threads. However, please note that my answer is based on a relative old information in the following link Windows I/O threads vs. managed I/O threads.

If anyone knows better on the current Windows 7 thread pool model that is different from the information in the link, please kindly share the information to educate me as well.

Also, you may find the following link useful to understand the windows file I/O operations: Synchronous and Asynchronous I/O

Chansik Im
  • 1,473
  • 8
  • 13