Reason why CFS scheduler using red black tree?

Question

CFS scheduler picks next process based on minimum virtual time and to get this value efficiently its using Red-Black tree(rbtree), using rbtree we will get minimum O(h) here h is height of rbtree. But, using min-heap we can get min virtual time process in O(1) time only. I just want to know why min-heap is not consider in CFS implementation and is there any difficulties using min-heap in kernel level?

I'am not an expert in algorithms, but it seems that after o(1) peek you should perform heapify (which is o(log n)) to restore heap property. So give the reference to the implementation you talking about if I am wrong — Alex Hoppus, Oct 18 '15 at 07:21

Vishal Sahu · Accepted Answer · 2020-03-06T04:04:53.923

The reason is: Heaps are array based and hence require contiguous memory in kernel space. This is because the way heaps are implemented in Linux. See the files lib/prio_heap.c and include/linux/prio_heap.h and you'll note that heap is kmalloc'd using heap_init. Once the multi-programming space becomes huge, maintaining thousands of struct sched_entity requires lot of contiguous space (it runs in several pages). From time and performance point of view, one would prefer heap as hepify operation can run in background once min vruntime is picked but it's space requirement which makes bottleneck.

As rbtree is readily available, kernel developers didn't think of implementing pointer based heap, in fact one doesn't need.

Thanks. This is the only explanation I've found that actually gives the advantage a rbtree has over a heap. Other explanations are all saying that they share the same time complexity. — ospider, Mar 20 '19 at 05:59

Tin Luu · Answer 2 · 2020-07-04T18:13:23.217

4

Another interesting point is that, considering you have a task (process or thread) changing state from runnable to blocked (waiting for io or network resource), then you need to remove that task from the runqueue and the complexities are:

O(log(n)) for red black tree
O(n) for heap

The remove operation of heap is slow and that's why red black tree is better.

And when we get the min vruntime, the heap operation is not actually O(1), O(1) only happen if you refer the root node without removing it. But in CFS, we need to

Remove it (which requires heapifying of O(log(n)))
Update vruntime, and insert it back to runqueue which needs O(log(n)), too

edited Jul 04 '20 at 18:13

answered Jul 04 '20 at 18:07

Tin Luu

1,557
16
22

1

True. This can be another possible reason! – Achint Sharma Dec 06 '21 at 04:48
Removing a heap node is O(log(n)) instead of O(n). Just swap the node to remove with the last element in the array, delete the element, and then heapify the swapped node. – h4x3rotab Jul 07 '22 at 11:04
The problem of removing is that "how do you find the node?". In a heap, there is no order inside which means you need to traverse all the node to find it. – Tin Luu Jul 08 '22 at 07:00

Hrishikesh · Answer 3 · 2022-10-01T11:02:23.990

Also to add one point, RBTreee is CFS scheduling cache the tree.min value when updating the entity based on the virtualruntime, which is the process with the smallest virtualruntime, which in turn yields an O(1) result for picking up the next process.

pick_next_entity() -> https://github.com/torvalds/linux/blob/9de1f9c8ca5100a02a2e271bdbde36202e251b4b/kernel/sched/fair.c#L4657

pick_next_entity() -> __pick_first_entity() https://github.com/torvalds/linux/blob/9de1f9c8ca5100a02a2e271bdbde36202e251b4b/kernel/sched/fair.c#L639

pick_next_entity() -> __pick_first_entity() -> (root)->rb_leftmost https://github.com/torvalds/linux/blob/3bc1bc0b59d04e997db25b84babf459ca1cd80b7/include/linux/rbtree.h#L106

Scheduler can try to access the cached value from root of the tree.

Reason why CFS scheduler using red black tree?

3 Answers3