Given the stated requirements where the cost of thread scheduler/context switch is too expensive, typically the best bet is to simply burn cycles as you do now to meet the tightest latency demands.
An atomic CAS spin/busy loop is basically a polling method, and as is commonly associated with polling, it has a tendency to hog CPU. Yet it doesn't pay the cost of the scheduler which you want to avoid here with that tight 16ms deadline to do everything you need to do and deliver a frame. It's typically your best bet to meet this kind of deadline.
With games where you have a lively world and constantly-animated content, there typically aren't lengthy idle periods where nothing is happening. As a result, it's generally considered quite acceptable for a game to be constantly utilizing CPU.
Probably a more productive question given your requirements is not how to reduce CPU utilization so much as to make sure the CPU utilization is going towards a good purpose. Typically a concurrent queue can offer a lock-free, non-blocking query to check to see if the queue is empty, which you already seem to have given this line:
while (a_queue.empty());
You might be able to sneak in some things to compute here while the queue is empty. This way you're not burning up cycles merely busy-waiting on the queue to become non-empty.
Again, typically the ideal answer to your question would involve a condition variable (which does depend on context switches and threads being woken up). It will typically be the fastest way to put the thread to sleep (reducing the CPU utilization) and have it woken up at the desired time, but given the tight latency requirements, it might be best to forget about the CPU utilization and focus more on making sure it's going towards a good purpose.