definitely not a good idea to hack your way around with core audio.. while it may be a quick fix, it will definitely hurt you in ambiguous ways in the long run.
your problem isn't the same as the link you posted, their problem was assigning the callback on the wrong thread.. in your case, your callback is in the right thread, it's just that the audio buffers you are feeding it initially are either empty, too small or contains data not fit for audio playback
keep in mind that the purpose of the callback is to fire after each audio buffer supplied to the audio queue has been played (ie consumed).. the fact that after you start the queue the callback isn't being fired.. it means that there is nothing in the audio buffers for it to consume.. or too little meaningful information for it to consume..
when you do it manually you see a lag b/c the audio queue is trying to process the empty/erroneous buffers you supplied it.. then you resupply the same buffers with valid data that the queue eventually plays and then fires the callback
solution: compare the data you put in the buffers before starting the queue with the data you are supplying manually.. i'm sure there is a difference.. if that doesn't work please show your code for further analysis