Spinning in a non-blocking app without gobbling up CPU time

Question

I have a UDP network application that reads packets sent to it and then processes them (same thread). The reads are non-blocking so I'm not using poll or select.

Packets received are grouped by sessions.

Work is governed by whether there is a session in progress. If there is no work to be done i.e. there are no sessions, or there are no packets to process then I need to spin.

I've been looking at the hybrid algorithm found here: http://www.1024cores.net/home/lock-free-algorithms/tricks/spinning

Been playing with it. I'm told it's more for busy waits. What methods do you use to prevent unnecessary processing and needlessly high CPU usage?

EDIT:

Thanks for all the answers and comments. I'm now doing the following. When it comes to reading from the network I look to see if there is other work to be done. If there is, then I call poll with a timeout of zero. I then read as many packets as I can and place them into an in memory queue for processing. If no other work then I poll indefinite (i.e. -1). It seems to work well, CPU is only high when things are busy, otherwise it drops to zero.

Why are you polling for network packets? Why not use `select` or `poll`? — Gabe, Apr 13 '11 at 05:43
Sorry, I should say it's just reading a udp packet if it's there, No need to call select or poll, just recvfrom on non-blocking socket. — hookenz, Apr 13 '11 at 05:45
Actually, it's best to use non-blocking sockets. Since I could get a false positive. i.e. poll indicates there is data but recvfrom blocks because the checksum failed. — hookenz, Apr 13 '11 at 11:21

score 3 · Answer 1 · answered Apr 13 '11 at 06:07

If you have nothing to do, you should be blocking - if not on the socket itself (i.e. if it's an event loop that processes more than one network socket or event type), then on a gate that gets signaled when something happens (the design depends on how your OS does async I/O).

Spinning is something you should only be doing when you're waiting for a very short period of time (usually only in kernel mode).

score 1 · Answer 2 · answered Apr 13 '11 at 05:53

Since you have to read from a socket, you can just do a blocking read. Without a packet, you have no reason to be running, right?

If there is more than one socket, then the blocking read won't work, so you need pselect() to monitor multiple descriptors.

Am I missing something obvious?

It occurs to me that you may have some long-term processing after you do receive a datagram. If the reason you are going with non-blocking I/O is to avoid ignoring incoming traffic while working on a session, then in that case the obvious thing to do is to fork() the sessions. (Hmm, so I still think I must be missing something...)

score 1 · Accepted Answer · answered Apr 13 '11 at 07:37

How many packets per second are you processing? How long does it take to process those packets? If you use blocking threads, what is the average CPU usage you get?

Unless blocking wait is close to 100% usage, where shaving a few bits of performance from the blocking itself can help, spinning will not improve but rather worsen performance. By spinning, you lock one core that will not be available to run other code (possibly including the code that feeds you with work: i.e. kernel code that reads network and passes up to your app the packets), you burn resources without performing any work at all...

Note that when the article says that it is harder to write blocking code than non blocking spin waits, the author is not talking about operations for which the blocking version is implemented in the system, but rather for situations where on thread must wait on a condition triggered by other threads (a shared variable value goes above/below a limit, a flag is changed...).

Also, if the cost of checking the condition is high, then spinning will incur in that cost for each and every iteration of the loop, and that might well exceed the cost of checking once and performing an expensive wait.

Remember that spinning is an active wait, it does not make sense to ask how to actively wait while not consuming processor, as the active wait approach implies consuming processor time. What can you do to avoid needless CPU usage? Use a blocking call to get the next packet. In the particular case of reading an UDP packet I doubt that two calls to the non-blocking read are not more expensive in processing time than a single call to the blocking read operation.

Again think on the questions in the beginning, that can be summed to: Is blocking proven to be the bottleneck? *Is this an scenario where active waits can actually help?*

Thanks for the very thorough answer. Your comments actually gave me much more to ponder even though other answers were upvoted and also correct. — hookenz, Apr 13 '11 at 08:42

Spinning in a non-blocking app without gobbling up CPU time

3 Answers3