Lockless game engine with complete seperation of update and render

Question

I apologize up front for this long post, but as you can probably see I have been thinking about this for quite some time, and I feel I need some input from other people before my head explodes :-)

I have been experimenting for some time now with various ways of building a game engine which satifies all the following criteria:

Complete seperation of object updating and object rendering
Full determinism
Updating and rendering at individual speeds
No blocking on shared resources

Complete seperation of object updating and object rendering

Seperation of object updating and object rendering seems to be vital to ensure optimal usage of resources while sending data to the graphics API and swapping buffers. Even if you want to ensure full parallelism to use multiple cores of a CPU it seems that this seperation must still be managed.

Full determinism

Many game types, and especially multiplayer versions, must ensure full determinism. Otherwise players will experience different states of the same game effectively breaking the game logic. Determinism is required for game replays as well. And it is useful for other purposes where it is important that each run of a simulation produces the same result every time given the same starting conditions and inputs.

Updating and rendering at individual speeds

This is really a prerequisite for full determinism as you cannot have the simulation depend on rendering speeds (ie the various monitor refresh rates, graphics adapter speed etc.). During optimal conditions the update speed should be set at a certain fixed interval (eg. 25 updates per second - maybe less depending on the update type), and the rendering speed should be whatever the client's monitor refresh rate / graphics adapter allows.

This implies that rendering speed higher that update speed should be allowed. And while that sounds like a waste there are known tricks to ensure that the added rendering cycles are not wastes (interpolation / extrapolation) which means that faster monitors / adapters would be rewarded with a more visually pleasing experience as they should.

Rendering speeds lower than update speed must also be allowed though, even if this does in fact result in wasted updating cycles - at least the added updating cycles are not all presented to the user. This is however necessary to ensure a smooth multiplayer experience even if the rendering in one of the clients slows to a sudden crawl for one reason or another.

No blocking on shared resources

If the other criterias mentioned above are to be implemented it must also follow that we cannot allow rendering to be waiting for updating or vice versa. Of course it is painfully obvious that when 2 different threads share access to resources and one thread is updating some of these resources then it is impossible to guarantee that blocking will never take place. It is, however, possible to keep this blocking at an absolute minimum - for example when switching pointer references between queue of updated object and a queue of previously rendered objects.

So...

My question to all you skilled people in here is: Am I asking for too much?

I have been reading about ideas of these various topics on many sites. But always it seems that one part or the other is left out from the suggestions I've seen. And maybe the reason is that you cannot have it all without compromise.

I started this seemingly common quest a long time ago when I was putting my thoughts about it in this thread: Thoughts about rendering loop strategies

Back then my first naive assumption was that it shouldn't matter if updating and reading happened simultaneously since this variations object state was so small that you shouldn't notice if one object was occasionally a step ahead of the other.

Now I am somewhat wiser, but still confused at times.

The most promising and detailed description of a method that would allow for all my wishes to come through was this: http://blog.slapware.eu/game-engine/programming/multithreaded-renderloop-part1/ A three-state model that will ensure that the renderer can always choose a new queue for rendering without any wait (except perhaps a micro-second while switching pointer-references). At the same time the updater can alway gain access to 2 queues required for building the next state tree (1 queue for creating/updating the next state, and 1 queue for reading the previsous - which can be done even while the renderer reads it as well).

I recently found time to make a sample implementation of this, and it works very well, but for two issues.

One is a minor issue of having to deal with multiple references to all involved objects
The other is more serious (unless I'm just being too needy). And that is the fact that extrapolation - as opposed to intrapolation - is used to maintain a visually pleasing representation of the states given a fast screen refresh rate. While both methods do the job of showing states deviating from the solidly calculated object states, extrapolation seems to me to produce much more visible artifacts when the predictions fail to represent reality. My position seems to be supported by this: http://gafferongames.com/networked-physics/snapshots-and-interpolation/ And it is not possible to implement interpolation in the three-state design as far as I can tell, since it requires the renderer to have read-access to 2 queues at all times to calculate the intermediate state between two known states.

So I was toying with extending the three-state model suggested on the slapware-blog to utilize interpolation instead of extrapolation - and at the same time try to simplify the multi-reference structur. While it seems to me to be possible, I am wondering if the price is too high. In order to meet all my goals I would need to have

2 queues (or states) exclusively held by the renderer (they could be used by another thread for read-only purposes, but never updated, or switched during rendering
1 queue (or state) with the newest updated state ready to switch over to the renderer, when it is done rendering the current scene
1 queue (or state) with the next frame being built/updated by the updater
1 queue (or state) containing a copy of the frame last built/updated. This is the same state as last sent to the renderer, so this queue/state should be accessible by both the updater for reading the previous state and the renderer for rendering the state.

So that would mean that I should keep at all times 4 copies of render states to be able to keep this design running smoothly, locklessly, deterministically.

I fear that I'm overthinking this. So if any of you have advise to pull me back on the ground, or advises of what can be improved, critique of the design, or perhaps references to good resources explaining how these goals can be achieved, or why this is or isn't a good idea - please hit me with them :-)

Why don't you try RCU? Steps: 1. Every updater-frame, the updater creates a RCU-protected copy of all relevant state for the renderer, and passes a pointer to it. 2. Every renderer-frame, the renderer picks up the pointer in a RCU critical section, makes a copy of the relevant state, exits the critical section and does whatever a renderer does. — EOF, Oct 03 '15 at 13:10
So this would require making 2 copies of all states? That may be too expensive in a system, that is write-once, read-a-few-times. Anyway this fire-and-forget functionality would definitely mean that I wouldn't have to maintain a network of references to objects, and that is certainly one of my goals. However, the RCU pattern seems to be more suited for systems, where objects are read more frequently. — uhre, Oct 04 '15 at 07:06
Yes, RCU is *probably* overkill, unless you are rendering in multiple threads. If you only have a single thread each for updates and rendering, you could instead make a single copy and pass an atomically refcounted pointer. — EOF, Oct 04 '15 at 08:24

Lockless game engine with complete seperation of update and render

0 Answers0