Say there is an algorithm X that requires 2 steps for the final output to a file.
- collect data
- sort data
Let us also say that the collected data is too large to be held in RAM and is written to a file before step 2 takes action.
For an example, take a file with 500GB that contains numbers, as output by step 1. One number in each line. Step 2 must sort the lines in ascending order.
How would step 2 go about efficiently sorting the numbers without reading the input file as a whole?