As you have already discovered, these are not simple user-space functions - it will be very tricky (impossible?) for you to implement a semaphore or mutex yourself without using the functions provided by the kernel.
For example, on Linux you have:
You have the concept correct but the two operations (the check and the inc/dec) need to be conducted in an "atomic" way - simplistically this means that they happen as one operation that cannot be split (read up on Linearizability).
Additionally, it's worth noting that you have implemented a 'busy loop', which when working with an operating system is a bad idea as you are depriving other tasks / processes from CPU time and raising the power usage while doing no actual work - the functions mentioned above will "block" with 0% CPU usage, while yours will "block" with 100% CPU usage if given the chance.
You would have more luck trying to 'play' with such concepts when running on a single core (you can restrict your applications execution to a single core - look at sched_setaffinity().
However, even if you get that going you have very little control over whether your process is scheduled out at a bad time causing your example application to break in exactly the same way. It might be possible to further improve your chances of correct operation by calling sched_setscheduler()
with SCHED_FIFO
, though I've not got first hand experience with this (ref, ref).
Either way, this is not likely to be 100% reliable, while the kernel-supported functions should be.
If you're up for it, then the best way to play with the implementation details in your own functions would be to implement a very basic round-robin scheduler (that doesn't interrupt tasks) and run it on a micro or in a single thread.