0

I am trying to get node CFS scheduler throttling in percent. For that i am reading 2 values 2 times (ignoring timeslices) from /proc/schedstat it has following format:

$ cat /proc/schedstat
version 15
timestamp 4297299139
cpu0 0 0 0 0 0 0 1145287047860 105917480368 8608857
                 CpuTime       RunqTime  

so i read from file, sleep for some time, read again, calculate time passed and value delta between, and calc percent then using following code:

cputTime := float64(delta.CpuTime) / delta.TimeDelta / 10000000
runqTime := float64(delta.RunqTime) / delta.TimeDelta / 10000000
percent := runqTime

the trick is that percent could be like 2000%

i assumed that runqtime is incremental, and is expressed in nanoseconds, so i divided it by 10^7 (to get it to 0-100% range), and timedelta is difference between measurements in seconds. what is wrong with it? how to do that properly?

xakepp35
  • 2,878
  • 7
  • 26
  • 54
  • Is [this](https://en.wikipedia.org/wiki/Relative_change_and_difference#Percentage_change) what you're trying to compute? – Olivier Sep 06 '21 at 13:04
  • @Olivier Imagine a driving car. at t=0sec it was at point x=1km. at point t=2sec it becomes at point x=5km. then i can calculate its speed, its (5-1)/2 = 2 km/s. Then i have a speed of light and nothing can exceed it. So 2 km/s / 300000 = 0,000006667% this is the percent i want to calculate. in case of speed of light i have MAX delta change per 1 second, which i believe is 10^9 nanoseconds – xakepp35 Sep 06 '21 at 13:11
  • @Olivier please do not add misleading tags – xakepp35 Sep 06 '21 at 13:14
  • @Olivier my question is related to parsing data from /proc/schedstat file, which comes from linux kernel.... Please reread the question from the start – xakepp35 Sep 06 '21 at 13:17
  • I have no idea about the units used in prod/schedstat's output, you would have to check your asumptions in the official documentation – LeGEC Sep 06 '21 at 13:25
  • 1
    @LeGEC doc states that previously they was in jiffies, and now they are expressed in nanos, [here we have an answer](https://unix.stackexchange.com/questions/418773/measure-units-in-proc-pid-schedstat) , but this does not work. i am reading with 3 seconds interval(i also measure this interval to be precise) and delta states that it spended on scheduler runqueue much more that 3 seconds, like delta could be 60 seconds, which is 2000% – xakepp35 Sep 06 '21 at 13:25
  • ok, then how are `delta.CpuTime` and `delta.TimeDelta` computed ? – LeGEC Sep 06 '21 at 13:32
  • @LeGEC read from `proc/schedstat`, wait 3 seconds, read from it again, take reading difference - in question example 7th number is a cu time, 8th is runqueue time. you read 2 numbers 2 times, and substract latter from one that you have in first reading. timedelta is just a time, passed between reading, typically its 3.0001 seconds – xakepp35 Sep 06 '21 at 15:21
  • @xakepp35 Same issue here, I'm trying to make sense of that delta. It's way larger than it should be. On an idle host, we expect very minimal run queue latency, not on the order of seconds which the delta is showing. – Michael Martinez Dec 09 '21 at 17:29

1 Answers1

0

I, for one, do not know how to interpret the output of /proc/schedstat.

You do quote an answer to a unix.stackexchange question, with a link to a mail in LKML that mentions a possible patch to the documentation.

However, "schedstat" is a term which is suspiciously missing from my local man proc page, and from the copies of man proc I could find on the internet. Actually, when searching for schedstat on Google, the results I get either do not mention the word "schedstat" (for example : I get links to copies of the man page, which mentions "sched" and "stat"), or non authoritative comments (fun fact : some of them quote that answer on stackexchange as a reference ...)

So at the moment : if I had to really understand what's in the output, I think I would try to read the code for my version of the kernel.


As far as "how do you compute delta ?", I understand what you intend to do, I had in mind something more like "what code have you written to do it ?".

By running cat /proc/schedstat; sleep 1 in a loop on my machine, I see that the "timestamp" entry is incremented by ~250 units on each iteration (so I honestly can't say what's the underlying unit for that field ...).
To compute delta.TimeDelta : do you use that field ? or do you take two instances of time.Now() ?

The other deltas are less ambiguous, I do imagine you took the difference between the counters you see :)
Do note that, on my mainly idle machine, I sometimes see increments higher than 10^9 over a second on these counters. So again : I do not know how to interpret these numbers.

LeGEC
  • 46,477
  • 5
  • 57
  • 104
  • To compute time delta I take 2 instances of `time.Now()`; To compute runq delta I take `cat /proc/schedstat; sleep 1; cat /proc/schedstat` and if you run kubernetes cluster on your machine, and you have alot of pods that are throttling, and you have big LoadAverage - you may notice that difference in `/proc/schedstat` runq time readings could be more than 10^9 nanoseconds – xakepp35 Sep 07 '21 at 06:43
  • On idle machine of courses you will have small delta between subsequent reading. And I assumed that delta could not be greater than actual time passed, so i can map it to 0-100% range, but that seems to be not true, and i am asking how to interpret it then? – xakepp35 Sep 07 '21 at 06:46
  • well, I think we got to the same point :) – LeGEC Sep 07 '21 at 06:47
  • that or : search for some other way to measure the scheduler activity. – LeGEC Sep 07 '21 at 06:48