1

Describe an O(n)-time algorithm that, given a set S of n distinct numbers and a positive integer k≤n , outputs the k numbers in S that are closest to the median of S (excluding the median). Hint: The target numbers may not be evenly placed around the median in the sorted version of the array. E.g., consider 1,2,3,8,10; the 2 numbers closest to the median 3 are 1,2, excluding the median itself, but they are both less than the median. Note: this is just an illustration; don't assume that the array is sorted)

Here is the answer that I found link:

Answer: Find the n/2 − k/2 largest element in linear time. Partition on that element. Then, find the k largest element in the bigger subarray formed from the partition. Then, the elements in the smaller subarray from partitioning on this element are the desired k numbers.

My illustration:

Suppose I have an array with 11 elements and the array is an unsorted array

index_number 1  2  3  4  5  6  7  8  9  10  11
arr_elements 2  5  3  10 4  7  1  12 6  13  8

As there are 11 elements median should be 11/2= 5.5 approximately, 6. So arr_element 7 is the median. Now the solution said Find the n/2 − k/2 largest element in linear time. Suppose k=4 so, k/4 = 2, therefore need to find out largest element from index 2 through index 6. The array elements from index 2 through 6 are {5,3,10,4,7}. So the largest element is 10. Now the answer said Partition on that element. So there will be two sub array after partitioning from arr_element 10. The sub arrays are {2,5,3} and {4,7,1,12,6,13,8}. Then the answer said Then, find the k largest element in the bigger subarray formed from the partition. k=4 so kth largest element means 4th largest element. The 4th largest element in the big subarray is 8. Now, the algorithm said Then, the elements in the smaller subarray from partitioning on this element are the desired k numbers. I did not understand this statement.

The problem came from Cormen's Introduction to algorithm Chapter 9: Median and order statistics

Any hints would be appreciated.

Encipher
  • 1,370
  • 1
  • 14
  • 31
  • 2
    Are you confused by the problem statement, or how to approach solving it? As a hint, you’ll need to know how to select the kth smallest element of an array in O(n) time, for any input k. Supposing you knew the median, could you find the furthest element from the median that should still be output? Specifically, how could you describe that furthest element in relation to the median? – kcsquared Feb 12 '22 at 23:04
  • I made some changes in my question. Can you now please suggest me any way to understand the problem? – Encipher Feb 19 '22 at 21:02
  • The link to unofficial 'CLRS solutions' you gave is... curious; the answer for this question, 9.3-7, appears vague and incorrect. The link you gave to the [identical earlier question from 2009](https://stackoverflow.com/q/1557678/16757174) is slightly more helpful. However, neither the top voted answers nor the accepted answer looks like a correct solution in `O(n)`. There are two answers there which are correct in linear time: [here](https://stackoverflow.com/a/17451804/16757174) and [here](https://stackoverflow.com/a/55986567/16757174) which you should read. – kcsquared Feb 19 '22 at 21:29

1 Answers1

0

The problem is to find the median, then find the distance d such that exactly k or k+1 points are within that distance from the median, and then output those points.

Hint: Study quickselect.

btilly
  • 43,296
  • 3
  • 59
  • 88
  • If the array is not sorted and the nearest element of median is the both side the median then what will be the approach? If the number of element n then in the right hand side there are n/2 elements and left had side is also n/2 elements. – Encipher Feb 16 '22 at 04:48
  • A similar approach is defined https://stackoverflow.com/questions/1557678/how-to-find-k-nearest-neighbors-to-the-median-of-n-distinct-numbers-in-on-time here. But I am not sure the algorithm that is in the post working for both sorted and unsorted array? – Encipher Feb 16 '22 at 04:51
  • One can find the solution from https://walkccc.me/CLRS/Chap09/9.3/ as well. After seeing lot of approaches I am not sure which one is ok. – Encipher Feb 16 '22 at 04:55
  • 2
    @Encipher That solution is unnecessarily pessimistic. Do quickselect to find the median. Transform into an array of distances from median. Do quickselect to find the distance which you need to get the `k` closest AND how many at the maximum distance you need. Then scan the array and take elements within the right range until you have your `k`. Works for sorted and unsorted. All `O(n)`. – btilly Feb 16 '22 at 05:10
  • Which algorithm is pessimistic here? There are twi links? Are you talking about stackoverflow link or clrs solution link? – Encipher Feb 16 '22 at 13:18
  • This is the https://en.m.wikipedia.org/wiki/Quickselect#:~:text=In%20computer%20science%2C%20quickselect%20is,known%20as%20Hoare's%20selection%20algorithm. Link of quick select. Do I need to go through the algorithm that define there? – Encipher Feb 16 '22 at 13:21
  • @Encipher The stack overflow answer was too pessimistic in claiming `O(kn)` is needed. The clrs solution link is correct. Quickselect is great for good runtime and average case `O(n)`. Median of medians gives poor average runtime but worse case `O(n)`. If the question says "worst case", do that. But other tradeoffs exist. For example 9/10 passes quickselect and 1/10 median of medians, you get performance within 1% of quickselect while still having worst case `O(n)` behavior. – btilly Feb 16 '22 at 20:09
  • Can you please take a look https://stackoverflow.com/questions/71178163/clrs-solution-seems-meaningless-as-one-line-make-me-skeptical – Encipher Feb 18 '22 at 23:51