Sum of medians (faster solution)

Question

We're given number N - length of the next list (1 <= N <= 10^5).

Then there is a list of N numbers (1 <= num <= 10^9).

The task is to find median on each iteration through 1 to N (on the i-th iteration we find median of sub-array lst[:i]) and then to find the sum of all N medians.

Exampes

Input:

10

5 10 8 1 7 3 9 6 2 4

Output:

59 (5+5+8+5+7+5+7+6+6+5)

Input2:

5

5 3 1 2 4

Output2:

16 (5+3+3+2+3)

Approach for better solution - Sum of medians - here was offered to use BinarySearchTrees and I did it.

But it wasn't enough to pass 2 sec time limit with these constrictions. Is there a faster solution?

class BinarySearchTree:
    def __init__(self, value):
        self.left = None
        self.right = None
        self.value = value

    def insert(self, value):
        if self.value:
            if value < self.value:
                if self.left is None:
                    self.left = BinarySearchTree(value)
                else:
                    self.left.insert(value)
            elif value > self.value:
                if self.right is None:
                    self.right = BinarySearchTree(value)
                else:
                    self.right.insert(value)
        else:
            self.value = value

    def output_subtree(self):
        if self.left:
            self.left.output_subtree()
        sub_tree.append(self.value)
        if self.right:
            self.right.output_subtree()


N = int(input())
vertices = list(map(int, input().split()))
medians = 0

tree = BinarySearchTree(vertices[0])
medians += vertices[0]

for i in range(1, N):
    sub_tree = []
    tree.insert(vertices[i])
    tree.output_subtree()
    if (i+1) % 2 == 0:
        medians += sub_tree[len(sub_tree)//2-1]
    else:
        medians += sub_tree[len(sub_tree)//2]

print(medians)

I don't quite understand your problem. Could you give some samples of inputs and expected outputs? — Moon Cheesez, Jun 05 '16 at 10:01
When N is even, are you taking the smaller number as the median? Why? — ayhan, Jun 05 '16 at 10:19
@ayhan, if lenght of the sub-array is even then we take N/2 - th element. Otherwise, (N+1)/2 - th element is median. The answers are correct, I checked it. It's all about time limit — Legonaftik, Jun 05 '16 at 10:29
You are not keeping the tree balanced. Read the comments for the answer suggesting BST. — Quinchilion, Jun 05 '16 at 11:56

MBo · Accepted Answer · 2016-06-05T12:49:22.743

You can use two-heaps approach.

Make two arrays with length = N/2

The first contains min binary heap, the second one - max binary heap. Min heap will store big values, max heap - small values

At every iteration add the next element from given list to one of the heaps, maintaining equal size (almost equal for odd counter).

If current element is larger than current median:
if min-heap size is equal to max-heap size, remove top of min-heap, insert that top to the max-heap
add current element into the min-heap.

If current element is smaller than current median:
if max-heap size is larger than min-heap size, move top of max-heap to min-heap
insert current element into the max-heap

After every stage top element of max-heap is median value.

This algorithm is O(NlogN), but heap works faster than search tree due to small hidden constant, and there is no need in memory reallocations.

     min heap         max heap
5    -               (5)
10   10              (5)
8    10              (8) 5
1    8 10            (5) 1
7    8 10            (7) 5 1
3    7 8 10          (5) 3 1 
9    8 9 10          (7) 5 3 1 
6    7 8 9 10        (6) 5 3 1
...

score 0 · Answer 2 · answered Jun 05 '16 at 21:48

Thanks to @MBo, I implemented solution for this problem using MinHeap and MaxHeap.

In MinHeap there's the minimal value at the top and any child is bigger than its parent. On the contrary, MaxHeap contains all small elements with the biggest of them at the root.

This structure lets us easily update the value of meadian on each iteration.

class MaxHeap:
    def __init__(self):
        self.heapList = [0]
        self.currentSize = 0

    def percUp(self,i):
        while i // 2 > 0:
          if self.heapList[i] > self.heapList[i // 2]:
             tmp = self.heapList[i // 2]
             self.heapList[i // 2] = self.heapList[i]
             self.heapList[i] = tmp
          i = i // 2

    def insert(self,k):
      self.heapList.append(k)
      self.currentSize = self.currentSize + 1
      self.percUp(self.currentSize)

    def percDown(self,i):
      while (i * 2) <= self.currentSize:
          mc = self.maxChild(i)
          if self.heapList[i] < self.heapList[mc]:
              tmp = self.heapList[i]
              self.heapList[i] = self.heapList[mc]
              self.heapList[mc] = tmp
          i = mc

    def maxChild(self,i):
      if i * 2 + 1 > self.currentSize:
          return i * 2
      else:
          if self.heapList[i*2] > self.heapList[i*2+1]:
              return i * 2
          else:
              return i * 2 + 1

    def delMax(self):
      retval = self.heapList[1]
      self.heapList[1] = self.heapList[self.currentSize]
      self.currentSize = self.currentSize - 1
      self.heapList.pop()
      self.percDown(1)
      return retval

    def buildHeap(self,alist):
      i = len(alist) // 2
      self.currentSize = len(alist)
      self.heapList = [0] + alist[:]
      while (i > 0):
          self.percDown(i)
          i = i - 1


class MinHeap:
    def __init__(self):
        self.heapList = [0]
        self.currentSize = 0

    def percUp(self,i):
        while i // 2 > 0:
          if self.heapList[i] < self.heapList[i // 2]:
             tmp = self.heapList[i // 2]
             self.heapList[i // 2] = self.heapList[i]
             self.heapList[i] = tmp
          i = i // 2

    def insert(self,k):
      self.heapList.append(k)
      self.currentSize = self.currentSize + 1
      self.percUp(self.currentSize)

    def percDown(self,i):
      while (i * 2) <= self.currentSize:
          mc = self.minChild(i)
          if self.heapList[i] > self.heapList[mc]:
              tmp = self.heapList[i]
              self.heapList[i] = self.heapList[mc]
              self.heapList[mc] = tmp
          i = mc

    def minChild(self,i):
      if i * 2 + 1 > self.currentSize:
          return i * 2
      else:
          if self.heapList[i*2] < self.heapList[i*2+1]:
              return i * 2
          else:
              return i * 2 + 1

    def delMin(self):
      retval = self.heapList[1]
      self.heapList[1] = self.heapList[self.currentSize]
      self.currentSize = self.currentSize - 1
      self.heapList.pop()
      self.percDown(1)
      return retval

    def buildHeap(self,alist):
      i = len(alist) // 2
      self.currentSize = len(alist)
      self.heapList = [0] + alist[:]
      while (i > 0):
          self.percDown(i)
          i = i - 1

N = int(input())
lst = list(map(int, input().split()))
medians = 0

# minimal value's at the top; any child is bigger than its parent
min_heap = MinHeap()
# conversely
max_heap = MaxHeap()

# initial first values for each tree
if lst[0] > lst[1]:
    min_heap.insert(lst[0])
    max_heap.insert(lst[1])
    medians += lst[0]+lst[1]
else:
    min_heap.insert(lst[1])
    max_heap.insert(lst[0])
    medians += 2*lst[0]

# then the same procedure of the rest
for i in range(2, N):
    if lst[i] < max_heap.heapList[1]:
        max_heap.insert(lst[i])
    else:
        min_heap.insert(lst[i])
    # if the difference of size is bigger than one then balance
    # the trees moving root of the biggest tree in another one
    if min_heap.currentSize-max_heap.currentSize > 1:
        max_heap.insert(min_heap.delMin())
    elif max_heap.currentSize-min_heap.currentSize > 1:
        min_heap.insert(max_heap.delMax())
    # if the length is even we take len/2-th element; odd ==> (len+1)/2
    if max_heap.currentSize >= min_heap.currentSize:
        medians += max_heap.heapList[1]
    else:
        medians += min_heap.heapList[1]

print(medians)

Sum of medians (faster solution)

2 Answers2