Well, this is by far not a case of a true-[PARALLEL]
process ( scheduling ), even if your professor or wannabe-"nerds" try to call it that way.
There is no way to move 100 cars, side by side, in [PARALLEL]
across a bridge
that has just a one pure-[SERIAL]
lane over a river.
As declared above, the fileIO is a "just"-[CONCURRENT]
process, there is no such device ( be it a spinning disk, or any form of NAND/FLASH-SSD disk-emulation device ), that could read and smoothly-deliver data from 100-different file locations at the very same time.
The maximum one can expect is to hide some part of the non-CPU part of the process-flow ( buffer & controller cache re-ordered fileIO may mask some part of the principal ~ 10 [ms]
seek-time ( not more than ~ 125 seeks per second, even on RAID ) and data-flow will never go above ~ 250 [MB/s/disk]
on classical spinning disk, network-transport latency + remote process-handling in case of a web-request will always accrue ~ from units to small hundreds of [ms]
just for L3-TCP/IP-RTT-latency + add whatever remote-processing takes ).
If going into domain of high-performance, one will definitely have to go into proper understanding of hardware, because all software high-level constructors expect users to understand cons and pros ( and in most cases, do not leave all the hardware-related decisions to user, so in most cases, one ought benchmark settings against the respective hardware platform to identify / validate, if such respective software-constructor indeed delivers any beneficial effects on the process performance, or not -- losing way more than receiving is a very common surprise in this domain, if a blind-belief or naive-implementation gets indeed benchmarked ).
Q: How can I predict the degree of parallelism to use?
A:
An analytical approach -- IDENTIFY the most narrow bridge in the game:
Go as deep into the real-system hardware infrastructure the code will be deployed at, so as to identify the weakest processing-chain element in the computing graph ( The very bridge, with the least number of true-parallel lanes - fileIO having ~ 1-lane, 4-core CPU having ~ 4-lanes ( may have more than 8-lanes, if having 2-ALU per CPU-core and doing only some well done locality-preserved heavy number-crunching ), 2-channel DRAM having ~ 2-lanes, etc. )
An experimental approach -- MEASURE performance of all the possible combinations:
If not willing to spend such efforts or if such information is not available in sufficient level of detail for analytical approach, one may prepare and run a set of blind brute-force black-box benchmarking experiments, measuring the
in-vivo performance effects of the controlled levels of concurrency / locally deployed fine-grain parallelism tricks. Experimental data may indicate directions, that may yield beneficial or adverse effects on the resulting End-to-End process performance.
Known Limitations:
There is no such thing as a repeatable controlled experiment, if going outside of a localhost
( local-area / wide-area networks background traffic workload envelopes, remote firewalls, remote processing-node(s), spurious intermittent workloads on any of the mediating devices -- all this simply prevents an experiment to be repeatable per-se, the less to be anything more, than just a one sample in some remarkably large empirical performance testing DataSET, if the results aim to have some relevance for final decision ( 10x, 100x, 1000x being not a measure, if in a serious need to cover various background workloads affected performance assessment of each of the experimental setup combinations ) ). Also may need to check a remote-website Terms & Conditions, as many API providers implement daily-use limiting / rate-trimming policies, so as not to get onto their respective blacklist / permanent ban, right due to violating these Terms & Conditions.
Epilogue for complete view & technology-purists:
Yes, there are indeed strategies for advanced, HPC grade, processing performance, that allow for circumventing this principal bottleneck, but it is not probable to have implemented such a kind of an HPC parallel filesystem on common mortals' user lands, as supercomputing resources belong rather to well financed federal- / EU- / government-sponsored R&D or mil/gov institutions, that operate such HPC-friendly environments