Where do initial task/kernel/cpu scheduler values after cold boot come from in Linux?

Question

This question is related to a former one, which contains the full background of my problem. The summary of that background is that I have two almost identical VMs and one of those is scaling very badly under some CPU intensive workloads with I think at least some I/O involved. So what I'm doing now is comparing outputs of sysctl -a for both VMs and researching on differences.

One of those differences is regarding the task scheduler, the initial values after a cold boot on the problematic VM are much higher than on the other one. So I'm wandering where those values come from initially? E.g. if they are calculated, which facts they depend on, maybe the VM-host, if they change on runtime automatically etc. Estimations about if those different values could have any reasonable impact on overall scaling of a system at all are welcome as well.

The following are examples for different values, while I only provide one of the 8 vCPUs and overall scheduler settings. The important parts for each vCPU look very similar.

Good vs. Bad VM:

--- C:/Users/tschoening/Desktop/Good VM.txt Mi 18. Apr 19:24:47 2018
+++ C:/Users/tschoening/Desktop/Bad VM.txt  Mi 18. Apr 19:24:44 2018
@@ -8,3 +8,3 @@ kernel.sched_domain.cpu0.domain0.imbalance_pct = 1
-kernel.sched_domain.cpu0.domain0.max_interval = 4
-kernel.sched_domain.cpu0.domain0.max_newidle_lb_cost = 75519
-kernel.sched_domain.cpu0.domain0.min_interval = 2
+kernel.sched_domain.cpu0.domain0.max_interval = 16
+kernel.sched_domain.cpu0.domain0.max_newidle_lb_cost = 155384
+kernel.sched_domain.cpu0.domain0.min_interval = 8
@@ -15 +15 @@ kernel.sched_domain.cpu0.domain0.wake_idx = 0
-kernel.sched_latency_ns = 12000000
+kernel.sched_latency_ns = 24000000
@@ -17 +17 @@ kernel.sched_migration_cost_ns = 500000
-kernel.sched_min_granularity_ns = 1500000
+kernel.sched_min_granularity_ns = 3000000
@@ -25 +25 @@ kernel.sched_tunable_scaling = 1
-kernel.sched_wakeup_granularity_ns = 2000000
+kernel.sched_wakeup_granularity_ns = 4000000

Good VM:

kernel.sched_domain.cpu0.domain0.busy_factor = 32
kernel.sched_domain.cpu0.domain0.busy_idx = 2
kernel.sched_domain.cpu0.domain0.cache_nice_tries = 1
kernel.sched_domain.cpu0.domain0.flags = 4143
kernel.sched_domain.cpu0.domain0.forkexec_idx = 0
kernel.sched_domain.cpu0.domain0.idle_idx = 1
kernel.sched_domain.cpu0.domain0.imbalance_pct = 125
kernel.sched_domain.cpu0.domain0.max_interval = 4
kernel.sched_domain.cpu0.domain0.max_newidle_lb_cost = 75519
kernel.sched_domain.cpu0.domain0.min_interval = 2
kernel.sched_domain.cpu0.domain0.name = DIE
kernel.sched_domain.cpu0.domain0.newidle_idx = 0
kernel.sched_domain.cpu0.domain0.wake_idx = 0

kernel.sched_latency_ns = 12000000
kernel.sched_migration_cost_ns = 500000
kernel.sched_min_granularity_ns = 1500000
kernel.sched_nr_migrate = 32
kernel.sched_rr_timeslice_ms = 25
kernel.sched_rt_period_us = 1000000
kernel.sched_rt_runtime_us = 950000
kernel.sched_shares_window_ns = 10000000
kernel.sched_time_avg_ms = 1000
kernel.sched_tunable_scaling = 1
kernel.sched_wakeup_granularity_ns = 2000000

Bad VM:

kernel.sched_domain.cpu0.domain0.busy_factor = 32
kernel.sched_domain.cpu0.domain0.busy_idx = 2
kernel.sched_domain.cpu0.domain0.cache_nice_tries = 1
kernel.sched_domain.cpu0.domain0.flags = 4143
kernel.sched_domain.cpu0.domain0.forkexec_idx = 0
kernel.sched_domain.cpu0.domain0.idle_idx = 1
kernel.sched_domain.cpu0.domain0.imbalance_pct = 125
kernel.sched_domain.cpu0.domain0.max_interval = 16
kernel.sched_domain.cpu0.domain0.max_newidle_lb_cost = 155384
kernel.sched_domain.cpu0.domain0.min_interval = 8
kernel.sched_domain.cpu0.domain0.name = DIE
kernel.sched_domain.cpu0.domain0.newidle_idx = 0
kernel.sched_domain.cpu0.domain0.wake_idx = 0

kernel.sched_latency_ns = 24000000
kernel.sched_migration_cost_ns = 500000
kernel.sched_min_granularity_ns = 3000000
kernel.sched_nr_migrate = 32
kernel.sched_rr_timeslice_ms = 25
kernel.sched_rt_period_us = 1000000
kernel.sched_rt_runtime_us = 950000
kernel.sched_shares_window_ns = 10000000
kernel.sched_time_avg_ms = 1000
kernel.sched_tunable_scaling = 1
kernel.sched_wakeup_granularity_ns = 4000000

score 0 · Accepted Answer · answered Apr 20 '18 at 08:58

Got an answer elsewhere, which I would like to document and link that thread:

The initial values are based on your hardware and suggested default values. If the servers are different, ie. different memory, processor; the default values can be different. None of what I see here is going to make much real-world difference. I know the numbers look a lot different, but in real-world terms they're not.

https://www.linuxquestions.org/questions/linux-software-2/where-do-initial-task-kernel-cpu-scheduler-values-after-cold-boot-come-from-4175627988/#post5845481

Besides that, I know for sure now that most of the differences simply come from the fact that one VM had 2 and the other 8 vCPUs at the moment I executed sysctl -a. There's most likely no wrong global setting or such, like I assumed. I see exactly the same different values for e.g. *_interval and sched_*_ns using some Ubuntu 14.04 I had on my desktop in VMware Workstation with 2 and 8 vCPUs. Complete different hardware, VMs etc., same numbers.

Where do initial task/kernel/cpu scheduler values after cold boot come from in Linux?

1 Answers1