To answer my own question ;-) The procedure goes like this...first create a CLEventList with the desired capacity, since I only want to measure kernel execution I set this to 1.
CLEventList list = new CLEventList(1);
Now when you set your kernel into the command queue you add the list as a argument:
queue.putReadBuffer(...).put1DKernel(..., list).putWriteBuffer(...).finish();
Afterwards you can get the timing by calling:
long start = list.getEvent(0).getProfilingInfo(ProfilingCommand.START);
long end = list.getEvent(0).getProfilingInfo(ProfilingCommand.END);
long duration = end - start // time in nanoseconds
Don't forget to initialize your command queue with Mode.PROFILING_MODE enabled.