19

I have been reading about Linux Kernel and CFS scheduler in the kernel. I came across vruntime (virtual runtime) that is the core concept behind CFS scheduler. I read from “Linux Kernel Development” and also from other blogs on internet but could not understand the basic calculations behind the vruntime. Does vruntime belong to a particular process or does it belong to a group of process with same nice values. What is the weighting factor and how is it calculated? I went through all these concepts but could not understand. Also what is the difference between vruntime and *min_vruntime*?

iammurtaza
  • 957
  • 3
  • 16
  • 31

2 Answers2

32

vruntime is per-thread; it is a member nested within the task_struct.

Essentially, vruntime is a measure of the "runtime" of the thread - the amount of time it has spent on the processor. The whole point of CFS is to be fair to all; hence, the algo kind of boils down to a simple thing: (among the tasks on a given runqueue) the task with the lowest vruntime is the task that most deserves to run, hence select it as 'next'. (The actual implementation is done using an rbtree for efficiency).

Taking into account various factors - like priority, nice value, cgroups, etc - the calculation of vruntime is not as straight-forward as a simple increment. I'd suggest reading the relevant section in "Professional Linux Kernel Architecture", Mauerer, Wrox Press - it's explained in great detail.

Pl see below a quick attempt at summarizing some of this.

Other resource: Documentation/scheduler/sched-design-CFS.txt

Quick summary - vruntime calculation: (based on the book)

  • Most of the work is done in kernel/sched_fair.c:__update_curr()

  • Called on timer tick

  • Updates the physical and virtual time 'current' has just spent on the processor

  • For tasks that run at default priority, i.e., nice value 0, the physical and virtual time spent is identical

  • Not so for tasks at other priority (nice) levels; thus the calculation of vruntime is affected by the priority of current using a load weight factor

    delta_exec = (unsigned long)(now – curr->exec_start); // ... delta_exec_weighted = calc_delta_fair(delta_exec, curr); curr->vruntime += delta_exec_weighted;

Neglecting some rounding and overflow checking, what calc_delta_fair does is to compute the value given by the following formula:

delta_exec_weighed = delta_exec * (NICE_0_LOAD / curr->load.weight)

The thing is, more important tasks (those with a lower nice value) will have larger weights; thus, by the above equations, the vruntime accounted to them will be smaller (thus having them enqueued more to the left on the rbtree!).

devoured elysium
  • 101,373
  • 131
  • 340
  • 557
kaiwan
  • 2,114
  • 1
  • 18
  • 23
  • What is the purpose of min_vruntime? – iammurtaza Oct 09 '13 at 11:20
  • @iammurtaza: the min vruntime (i'm taking the tunable /proc/sys/kernel/sched_min_granularity_ns) is typically (on recent Ubuntu at least) ~ 2.25ms. If this wasn't present, 2 tasks could "ping-pong" every few microseconds on and off the processor. A minimum guarantee for staying on cpu is required to mitigate this thrashing behavior. – kaiwan Mar 09 '17 at 13:44
  • 2
    @kaiwan is confusing the min_vruntime with the minimum granularity. min_vruntime simply keeps track of the minimum value for any vruntime value currently found in a given red-black tree. – Keith Irwin Sep 17 '19 at 07:28
  • @KeithIrwin : i haven't even mentioned min_vruntime at all in my answer ?? – kaiwan Sep 17 '19 at 08:09
  • 1
    @kaiwan I wasn't talking about the answer. The answer is great. I upvoted it. I was talking about your comment just above answering iammurtaza's question about what the purpose of min_vruntime. – Keith Irwin Sep 20 '19 at 07:10
6

The vruntime is the virtual runtime of a process which helps in tracking for how much time a process has run. The vruntime is a member of the sched_entity structure defined in include/linux/sched.h

The min_vruntime represents the minimum vruntime of a cfs runqueue. It represents the minimum of all the vruntime of the processes that is scheduled on that cfs runqueue. The min_vruntime is a member of cfs_rq structure defined in include/linux/sched.h

The purpose of min_vruntime is to select the next process in the cfs runqueue to run. In order to be fair to all the processes, the CFS scheduler selects the process with the minimum vruntime to execute first.

The link to include/linux/sched.h is: https://elixir.bootlin.com/linux/latest/source/include/linux/sched.h

Vens8
  • 85
  • 1
  • 8
user3131593
  • 81
  • 1
  • 6