2

I was reading "Programming Pearls" and I am really confused in one of the solution explanations.

The question was: "A file containing at most n positive integers, each less than n, where n = 10^7. Each positive integer could appear at most ten times. How would you sort the file?"

Given solution in the book: " If each integer appears at most ten times, then we can count its occurence in a four-bit half byte. Using the solution to Problem 5 (below) we can sort the complete file in a single pass with 10,000,000/2 bytes, or in k passes with 10,000,000/2k bytes"

Solution to problem 5 is: A two-pass algorithm first sorts the integers 0 through 4,999,999 using 5,000,000/8 = 625,000 words of storage, then sorts 5,000,000 through 9,999,999 in a second pass. A k-pass algorithm sorts at most n non-repeated positive integers less than n in time kn and space n/k.)

I am not getting how author is coming to 10,000,000/2k space to sort. I mean, based on the solution to problem 5, first we need 625K bytes of space to sort in first pass and additional 1/2 byte per integer to store the count right?

Could someone please help me understand this?

Thanks a lot.

test123
  • 13,865
  • 9
  • 28
  • 33

2 Answers2

1
Each positive integer could appear at most ten times.

You can use 4 bits per counter for this, not 2 bytes. If you group counters you can even lower this value, for example if you group 3 counters, that's 10*10*10=1000 combinations and you need 10 bits (=1024 values) for that.

Karoly Horvath
  • 94,607
  • 11
  • 117
  • 176
0

10,000,000 - because there are 10,000,000 possible values.

2 - because each byte consists of two half-bytes.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
  • Thanks for the reply Peter. I get that part. What I am confused about is, wouldn't we require 10,000,000/8 bytes of memory also to sort the array in addition to 10,000,000/2 bytes for storing counts for each int? – test123 Aug 28 '11 at 08:04
  • from the counters you can reconstruct the sorted elements, why would you need additional memory? – Karoly Horvath Aug 28 '11 at 08:28
  • 1
    Ah got it. So we can just store 2 elements count in 1 byte and then reconstruct it with that. Thanks for explanation yi_H. – test123 Aug 28 '11 at 09:55