Global syncronization within a kernel is not possible. This is because work groups are not gauranteed to be run at the same time. You can achieve a sort of global sync in the host application if you break your kernel into pieces. This is not suitable for many kernels, espeically if you use a lot of local memory or have a bit of initialization code before your kernel does any real work.
Break you kernel into two pars -- kernelA and kernelB for example. Global syncronization is simply a matter of running the NDRange for kernelA, then finish(), and NDRange for kernelB. The global data will remain in memory between the two calls.
Again, not pretty and not necessarily high performance, but if you really must have global sync, this is the only way to get it.