0

I have a simple quick select algorithm and want to understand why it doesn't work sometimes. The question is to find Top K Frequent Elements. I know there are other ways to do it such as using heap or bucket sorting. But I am curios why my algorithm doesn't work as I know there are other ways to do quick select which work. It sometimes doesn't work due to the random.choice function which means any of the following can happen:

  1. get the correct response
  2. get the incorrect response
  3. face maximum recursion depth exceeded error

Why? Isn't quick select based on random function, how come other quick select algorithms that use Python functions for randomness work but this one doesn't. what do I miss?

inputs are a list of numbers as nums and an integer k

 def topKFrequent(self, nums: List[int], k: int) -> List[int]:
    histogram = Counter(nums) """ nums' frequency histogram/map"""
    def quickSelect(keys, k, ri = []):  
        r = random.choice(keys)  """ find the random key""" 
        pivot = histogram[r]  """ get the random frequency based on the random key"""
        left, right = [], [] """ put the keys in either right or left list based on their values"""
        for key in keys:
            if histogram[key] >= pivot:
                right.append(key)
            else:
                left.append(key) 
        if k == len(right) + len(ri): """ if the size of right plus previous right (ri) is equal to k we found it"""
            return right + ri
        elif k < len(right) + len(ri):
            quickSelect(right, k, ri)         
        else:
            quickSelect(left, k - len(right) - len(ri), right+ri)

    return quickSelect(list(histogram.keys()), k, [])

Update

it works per the accepted response suggestion. Added other way of solving it as comments as well for those who are interested.

    def topKFrequent(self, nums: List[int], k: int) -> List[int]:

    ####heap in O(nlogk)
    # histogram = Counter(nums)
    # return heapq.nlargest(k, histogram.keys(), key = histogram.get)

    ####bucket sort in O(n)
    # bucket = [[] for _ in range(len(nums)+1)]
    # histogram = Counter(nums)
    # for n, f in histogram.items():
    #     bucket[f].append(n)
    # res = []
    # for i in range(len(bucket)-1, -1, -1):
    #     if bucket[i]:
    #         for n in bucket[i]:
    #             res.append(n)
    #             if len(res) == k:
    #                 return res          
    # return None

       #### quick select 
       histogram = Counter(nums) 
       def quickSelect(keys, k):  
          pivot = histogram[random.choice(keys)]
          left, right = [], []
          for key in keys:
              if histogram[key] >= pivot:
                  right.append(key)
              else:
                  left.append(key) 
          if k == len(right):
              return right
          elif k < len(right):
              return quickSelect(right, k)         
          else:
              return quickSelect(left, k - len(right)) + right
       return quickSelect(list(histogram.keys()), k)
Alan
  • 417
  • 1
  • 7
  • 22
  • Consider following the [Style Guide for Python Code](https://peps.python.org/pep-0008/#documentation-strings) some more and document your code. – greybeard Aug 04 '23 at 05:22
  • 1
    Deep recursion prevention is mentioned as an implementation issue on [en.wikipedia](https://en.m.wikipedia.org/wiki/Quicksort#Optimizations). – greybeard Aug 05 '23 at 17:58
  • Thanks for the link, it suggests insertion sort after a certain threshold. Interesting approach for optimization in practice or building libraries. However, here I wanted to find (develop) a quickSelect algorithm which is more intuitive "to me" and easier to redo in future for me as apposed to other ones. From that perspective, I don't think mine is performing worse than others. Here I don't do unnecessary search on "left" as I check the length of right – Alan Aug 05 '23 at 18:28
  • `I don't do unnecessary search on "left" as I check the length of right` nor pointless partitions of right. But with *nd* the number of distinct values, if the pivot splits off exactly *one* value "from the bigger part" in each call, you still need a recursion depth of nd-2(?) worst case (max(nd-k, k) if "from the part bigger initially") - and with unique values nd = n (len(nums)). – greybeard Aug 06 '23 at 06:56
  • yes, the textbook quickSelect also in worst case is nd-2 (n^2). It all depends how good is the random function partitions the list (left or right). in practice and real world (on average) the time complexity of the algorithm above is O(n), as n the size of the `nums`. If somehow we were able to have a biased random number generator which considers `k` as input to find a random pivot that could result to the size of `right` be equal or close enough to k then that would probably optimize, neglecting the time complexity of the "new" random number generator – Alan Aug 07 '23 at 00:47
  • I was *not* arguing run time, but recursion depth. And the *reduced number of distinct values nd* has little to do with the square of the *number of values n*. – greybeard Aug 07 '23 at 05:24
  • do you have any idea how to reduce the recursion depth? the pivot plays an important role and how to select it, but how to optimize the selection of pivot, i am not sure. please share your thoughts – Alan Aug 07 '23 at 06:04
  • I put my ideas to limit call depth in writing in [my answer](/a/76833274/3789665). See the development of the non-recursive `select()` at the end of [en.wikipedia's paragraph on the algorithm for quickselect](https://en.m.wikipedia.org/wiki/Quickselect#Algorithm), too. – greybeard Aug 07 '23 at 06:18

1 Answers1

1

quickSelect(keys, k, ri = []) does not explicitly return a value when it recurses - the return value will be None.

You don't need ri: manipulate k.

The same approach preventing a decent quicksort implementation from ever needing O(n) levels of recursion would work for quickSelect:
Do not recurse on the larger partition.

But check if then you need to recurse at all.

greybeard
  • 2,249
  • 8
  • 30
  • 66
  • Not sure exactly got what you mention. As far as I know, the quickSelect algorithm recurse over one of the partitions, which i do it either on left or right list. As I said, sometimes I get correct response. What I don't get, is why? Because other quickSelects that i saw also use randomness and they work. What exactly is wrong? Can you be more specific? – Alan Aug 04 '23 at 06:05
  • Also I need to return a list that's why I need `ri` to keep track of the already selected right side of the pivot. – Alan Aug 04 '23 at 06:07
  • 1
    more specific than *sometimes I get correct response* or *calling `quickSelect(left, k - len(right) - len(ri), right+ri)` doesn't do*? Try `return quickSelect(left, k - len(right)) + right`. (There *are* languages returning "the last expression" implicitly. Python isn't one of them.) – greybeard Aug 04 '23 at 06:11
  • that works! thanks – Alan Aug 04 '23 at 06:20