Posted a comment but I might as well write this as the answer since its long an I need formatting.
Basically you're two options are:
Lock-free queues, the most popular of which is this:
https://github.com/cameron314/concurrentqueue
They do have try_pop, because it uses atomic pointer and any atomic methods (e.g. std::atomic_compare_exchange_weak) can and will "fail" and return false at times, so you are forced to have a spin-lock over them.
You may find queues that abstract this inside a "pop" which just calls "try_pop" until it works, but that's the same overhead in the backround.
Lock-base queues:
These are easier to do on your own, without a third part library, just wrap every method you need in locks, if you want to 'peek' very often look into using shared_locks, otherwise just std::lock_guard should be enough to guard all wrapper. However this is what you may call a 'blocking' queue since during an access, weather it is to read or to write, the whole queue will be locked.
There is not thread-safe alternatives to these two implementations. If you are in need of a really large queue (e.g. hundreds of GBs of memory worth of objects) under heavy usage you can consider writing some custom hybrid data structure, but for most usecases moodycamel's queue will be more than sufficient an.