Interview puzzle: Sorting a million number input with limited memory

Question

I tried answering this using external sort, but interviewer replied that the complexity was to high n.n(log(n)) i.e. n square *logn. Is there a better alternative.

To simplify the question: Lets suppose we have 1000 elements to sort with space allocated for 100 elements only. What is the best algorithm that will take lesser time than the external sort.

@AliImran you don't know the range for the data, or even if it has a string representation by which it's sorted. — John Dvorak, Dec 08 '12 at 08:37
for more info :-http://en.wikipedia.org/wiki/Sorting_algorithm — Ali Imran, Dec 08 '12 at 08:48

John Dvorak · Answer 1 · 2012-12-08T09:49:46.597

5

I don't know which external sort you (or the interviewer) meant, but

my suggestion is a 10-way (in your case) merge:

Split the file into chunks of MAX_MEM size (100 elements)
- this is O(1)
Sort each chunk in memory and store as a separate file
- this is O((n/max_mem) * (max_mem) log(max_mem))) = O(n log(max_mem))
Open all chunks as streams of elements
Merge all streams by selecting the lowest element at each step.
- this is O(n log(n/max_mem)) using a minHeap or O(n^2/max_mem) trivially (may be faster in practice)
Delete the chunks

Concerning computation, this is O(n (log(max_mem)+log(n/max_mem)))=O(n log(n))

Concerning disk I/O, if all merging is done in one pass, this is 2*n reads and 2*n writes only. More generally, it's (1+[depth of the merge tree])*n

All writes are sequential. The first read is sequential, the second one is sequential, interleaved from 10 files.

If there was much more data, you'd need repeated or recursive merge (100 per chunk, then pick N chunks repeatedly). At this point, it's worth replacing the split+sort step with Replacement/Selection as described in the @amit's answer, especially when the data is already almost sorted (you may evade the merging step completely).

Note that higher N may increase computation (very slightly, if you use the right structures), but reduces the amount of disk I/O significantly (up to a certain amount; if you merge too many chunks at once, you may run out of memory for the read buffers, causing unneccessary reads) . Disk I/O is expensive, CPU cycles are not.

edited Dec 08 '12 at 09:49

answered Dec 08 '12 at 08:31

John Dvorak

26,799
13
69
83

As specified in the question, I already answered this, this will take order of n*n(logn) time, which is very high as per the interviewer – Anshu Kandhari Dec 08 '12 at 08:42
@AnshuKandhari it won't take this long. Why should it? – John Dvorak Dec 08 '12 at 08:42
The second argument is half-wrong. It is true that the radix sort needs more memory, but the memory required depends on the number of bits you use on each pass(number of buckets). Hence, the memory required may well be less than the requirements of mergesort, for example **from here**:-http://stackoverflow.com/questions/3539265/why-quicksort-is-more-popular-than-radix-sort – Ali Imran Dec 08 '12 at 08:43
2

@Jan: see you will first you take 10 chunks of 100 elements and sort them. Tim complexity= 10*100(log 100) – Anshu Kandhari Dec 08 '12 at 08:45
1

@AnshuKandhari this _is_ `O(n log(max_mem))`, not `O(n^2 log(max_mem))`. (technically, read theta instead of big-oh for the second statement). – John Dvorak Dec 08 '12 at 08:53
@Jan: see you will first you take 10 chunks of 100 elements and sort them. Tim complexity= 10*100(log 100) Then 1)You will take 10 elements from each of those sorted chunks 2)Then find out the minimum 10 elements. 10*100log100 3)Then do the repeated operation i.e. repeat step (1) and for each step (1) do 10 times step (2) Overall complexity 100*100log100 – Anshu Kandhari Dec 08 '12 at 08:55
@JanDvorak: There is problem with big O notation and external sort, it is simply not enough, since the constants are very high for random read/write of elements, for examples. If we cared only for big-O notation, we could just do a standard quick-sort optimized for memory and let the OS do the rest for us with its paging mechanism. – amit Dec 08 '12 at 08:56
@AnshuKandhari finding the minimum of 10 elements does not take nearly as long as you say. The time complexity is for the entire loop. – John Dvorak Dec 08 '12 at 08:57
@amit the chunks themselves are read sequentially. Random read is not a problem. – John Dvorak Dec 08 '12 at 08:58
@JanDvorak: But the writes are not. If you fill your memory entirely with the data - there is no extra space for the merged array, which requires you to write an element at a type - while files are written in blocks, effectively increasing the number of writes in a multiply of ~*4K – amit Dec 08 '12 at 09:01
@amit what makes you think the writes are not sequential? It's written as a stream. Are you concerned that there is no more memory left for OS' write buffers? – John Dvorak Dec 08 '12 at 09:02
@amit Note the memory is enough to hold 10 elements per chunk in the read buffer. – John Dvorak Dec 08 '12 at 09:05
@amit note that I only use the asymptotic notation to counter the asker's claims about the complexity. – John Dvorak Dec 08 '12 at 09:09
@JanDvorak: 10 elements per chunk in the read buffer means that in each disk read you can store 10 elements in memory and throw the rest. Disk reads are done in blocks (~4KB usually, this is HW issue) - thus each block with this approach will be read at least 4K/10=400 times, which is terrible. – amit Dec 08 '12 at 09:16
@amit don't forget the disk own cache. – John Dvorak Dec 08 '12 at 09:17
@JanDvorak: I am not forgetting it, but if `#sub_arrays * 4K > disk_cache` - it won't help you much. – amit Dec 08 '12 at 09:19
@amit do you mean a disk cache smaller than 40K? Correct me if i'm wrong but isn't the typical amount 20 times higher? – John Dvorak Dec 08 '12 at 09:21
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/20793/discussion-between-amit-and-jan-dvorak) – amit Dec 08 '12 at 09:22
What if number of chunks will be greater than MAX_MEM? In this way you will not have enough memory to marge streams. Am I wrong? – ivan.mylyanyk Jan 12 '13 at 23:38
@cupidon4uk "If there was much more data, you'd need repeated or recursive merge (100 per chunk, then pick N chunks repeatedly)." – John Dvorak Jan 13 '13 at 06:15

amit · Answer 2 · 2012-12-08T09:18:01.990

The standard way of doing it is an External Sort.

In external sort - it is not only important to have O(nlogn) comlexity - it is also critical to minimize as much as possible the disk reads/writes, and make the most reads and writes sequential (and not random) - since disk access is much more efficient when done sequentially.

The standard way of doing so is indeed a k-way merge sort, as suggsested by @JanDvorak, but there are some faults and addition to the suggestion I am aiming to correct:

First, doing an RS (Replacement-Selection) on the input decreases the number of initial "runs" (number of increasing sequences) and thus usually decrease the total number of iterations needed by the later on merge sort.
We need memory for buffering (reading and writing input) - thus, for memory size M, and file size M*10, we cannot do 10-way merge - it will result in a LOT of read disks (reading each element, rather then in blocks).
The standard formula for k - the "order" of the merge is M/(2b) (where M is the size of your memory, and b is the size of each "buffer" (usually disk block).
Each merge sort step is done by reading b entries from each "run" generated in previous iteration - filling M/2 in the memory. The rest of the memory is for "prediction" (which allows continious work with minimal wait for IO) - requesting more elements from a a run, and for the output buffer - in order to guarantee sequential right in blocks.
Total number of iterations with this approach is log_k(N/(2M)) where k is the number of runs (previously calculated), M is the size of the memory, and N is the size of the file. Each iteration requires 1 sequential read and 1 sequential write of the entire file.

That said - the ratio of file_size/memory_size is usually MUCH more then 10. If you are interested only in a ratio of 10, a local optimizations might take place, but it is not for the more common case where file_size/memory_size >> 10

score 3 · Answer 3 · answered Dec 08 '12 at 09:05

3

Perhaps the interviewer expected you to ask: Are those numbers the unique seven digit telephone numbers mentioned by J. Bentley (Cracking the Oyster)?

answered Dec 08 '12 at 09:05

Ekkehard.Horner

38,498
2
45
96

Interview puzzle: Sorting a million number input with limited memory

3 Answers3

Linked