How to code for proper CPU utilization?

Question

Bear with me, this might be a little difficult to explain clearly. I am trying to understand how to code a program which uses only the amount of CPU it needs. It's a little confusing to explain so I will just use a real example.

I made a Tetris game with an infinite main game loop. I have restricted it to 40 fps. But the loop still executes thousands or even millions of times per second. It just renders when enough time has passed by to restrict it to 40 fps.

Since I have a 4 core CPU, when I run the game, everything is fine and the game runs good. But the CPU usage is held at 25% for the game process. This is expected since it is an infinite loop and keeps running continuously.

I then read online to add a delay for 1 ms to the main loop. This immediately reduced the usage to around 1% or less. This is good, but now I am deliberately waiting 1 ms every loop. It works because my main loop takes a lot less time to execute and the 1ms delay does not affect the game.

But what if I make larger games. Games with longer and more processor intensive loops. What if I need that 1ms time slice for running the game smoothly. Then, if I remove the delay, the processor will jump to 25% again. If I add the delay, the game will be slow and maybe have some lag.

What is the ideal solution in this case ? How are real games/applications coded to prevent this problem ?

Read applications use multi-threading. I'm not really sure what would be the ideal setup for a tetris game, so I'll refrain from answering. But it would probably be something like: one thread which cyclically reads user input, one main thread that performs all calculations, one or several graphics threads that handle the screen and possible animations, and then maybe one timer and/or thread for the falling block pace. — Lundin, Jun 17 '14 at 10:40
@Lundin This seems complicated. But yes, I agree that CPU utilization could be better managed with multi-threading. Although, is this the norm and common design for applications/games ? — Talha Sayed, Jun 17 '14 at 10:51
See if this helps: http://stackoverflow.com/questions/10972533/is-there-anyway-to-know-when-the-screen-has-been-updated-refreshed-opengl-or-di — slim, Jun 17 '14 at 10:52
*How are real games/applications coded to prevent this problem ?* If VSYNC is enabled they wait for the monitor. Otherwise they render as fast as they can. — ta.speot.is, Jun 17 '14 at 11:03
@slim Thanks, I checked the link. That question deals with calculating time between render and first input though. — Talha Sayed, Jun 17 '14 at 11:05

slim · Accepted Answer · 2014-06-17T12:14:12.917

Since you list three different languages in the tags, I'll keep this general and not provide code samples.

In general, to avoid burning CPU, never have a loop that does not on every iteration either:

do useful work
skip based on the loop counter
or invoke some blocking call, either to blocking I/O or a thread wait().

sleep() is one example of a blocking call, but as you've observed, it's a bit of a bodge in many cases.

So:

while(true) {
    if(some_condition) {
        foo();
    }
}

... is bad. (A friend of mine once brought a shared mainframe to its knees with code like this)

You need to find a call to your display API which blocks until a vertical sync. I believe in DirectX, device.Present() is one such call, if the device is set up appropriately.

In a single-threaded game, the logic might go:

 while(game is active)
    read user input
    calculate next frame
    blocking call to display API

So, the CPU gets a rest, waiting for the vertical sync each time.

It's more conventional to have at least two threads, one handling the rendering loop, another handling game state. In that case the rendering loop needs to wait for vertical syncs, as before. The game state loop needs to block until the rendering loop is ready.

Rendering thread loop:

  while(game is active)
      notify()
      prepare_frame(game_state)
      blocking call to display API

Game state thread loop:

  while(game is active)
      read user input
      update game_state
      wait(display_loop_thread)

Be sure to understand thread wait/notify/join in order to make sense of this.

This model allows you to have other threads that also affect the game state. For example another thread might control an AI enemy.

An alternative to this is to make the calculations event-driven, and trigger them after a vsync:

 while(game is active)
     calculate next frame
     blocking call to display API
     gameLogic.onFrame()

If onFrame() takes longer than a frame to complete, then the game's framerate will suffer. Whether this matters or not depends on the game; the solutions are beyond the scope of this answer -- if it matters to you it's probably time to buy a book on video game architecture.

Thanks. I really like your answer. Also the advice "never have an infinite loop without a blocking call". Although I don't have any experience with multi threaded programming, it seems I can't live without it in the game programming world. Looks like its essential to a good game design. — Talha Sayed, Jun 17 '14 at 12:34
Be warned that API calls that effectively wait for a vblank may be busywaiting themselves. I know OpenGL's `SwapBuffers` used to do this on many implementations; I don't know whether that's still the case, or whether D3D's `device.Present` does the same. — mrec, Jun 17 '14 at 16:37

Paolo Brandoli · Answer 2 · 2014-06-17T11:01:07.103

3

Instead of sleeping for 1 ms, you could sleep for X ms where X is is calculated with the formula max(NextDrawingTime-CurrentTime, 0)

edited Jun 17 '14 at 11:01

answered Jun 17 '14 at 10:36

Paolo Brandoli

4,681
26
38

I had a similar thought for some "Smart Sleep". But 1ms is a lot of time in the CPU world and I am afraid it doesn't offer the granularity needed for some thing like "sleep for 0.05ms". Correct me if I am wrong but as far as I know common languages allow only a 1ms sleep. – Talha Sayed Jun 17 '14 at 10:45
@TalhaSayed I think that to render 40 times per second, a millisecond offers enough granularity. And anyway, the system will probably not be able to guarantee sleep time with a precision grater than some milliseconds. – Paolo Brandoli Jun 17 '14 at 10:49
@TalhaSayed Many big games do simply run as quick as it can, and for more simple games you'd use a sleep such as this (just remember to skip the Sleep call if there's less than what the resolution of Sleep call can guarantee) Depending on the API you're using, you might also enable vsync, which will cap the rendering operation to the refresh rate of the monitor - and for many games you can do everything you need to do between the frames (and so avoid multithreading to do other work). – nos Jun 17 '14 at 10:58
Instead of `sleep` try `select` which has microseconds granularity so you can indeed sleep for 0.05ms. You can even sleep for 0.005ms if you want. It's the most cross platform function to do sub-millisecond timers. Just call it with empty read and write sets. Indeed, most scripting languages and event oriented/asynchronous libraries use select as their underlying mechanism. – slebetman Jun 18 '14 at 03:07
@slebetman I don't understand what you mean by select. Do you mean select as used in an RDBMS ? – Talha Sayed Jun 20 '14 at 18:59
select is part of the standard C library available to all POSIX compliant OSes. And Windows has been POSIX compliant since 1998. Of course, MacOS and Linux, being unix and unix-like, are also both POSIX compliant. If you're on a unixen then type `man select` otherwise see: http://linux.die.net/man/2/select – slebetman Jun 20 '14 at 23:28
I'm honestly surprised to find a C programmer who's not aware of `select`. Most programmers don't realize that select can be used to implement timeouts since its main function is multiplexing but they're usually aware of select. – slebetman Jun 20 '14 at 23:31

How to code for proper CPU utilization?

2 Answers2