I don't know which external sort you (or the interviewer) meant, but
my suggestion is a 10-way (in your case) merge:
- Split the file into chunks of MAX_MEM size (100 elements)
- Sort each chunk in memory and store as a separate file
- this is
O((n/max_mem) * (max_mem) log(max_mem)))
= O(n log(max_mem))
- Open all chunks as streams of elements
- Merge all streams by selecting the lowest element at each step.
- this is
O(n log(n/max_mem))
using a minHeap or O(n^2/max_mem)
trivially (may be faster in practice)
- Delete the chunks
Concerning computation, this is O(n (log(max_mem)+log(n/max_mem)))
=O(n log(n))
Concerning disk I/O, if all merging is done in one pass, this is 2*n
reads and 2*n
writes only.
More generally, it's (1+[depth of the merge tree])*n
All writes are sequential.
The first read is sequential, the second one is sequential, interleaved from 10 files.
If there was much more data, you'd need repeated or recursive merge (100 per chunk, then pick N chunks repeatedly). At this point, it's worth replacing the split+sort step with Replacement/Selection as described in the @amit's answer, especially when the data is already almost sorted (you may evade the merging step completely).
Note that higher N may increase computation (very slightly, if you use the right structures), but reduces the amount of disk I/O significantly (up to a certain amount; if you merge too many chunks at once, you may run out of memory for the read buffers, causing unneccessary reads) . Disk I/O is expensive, CPU cycles are not.