You tend to think, "Oh how many cores my device has?" "Therefore I will launch that many amount of threads."
That way of thinking is wrong for cases like OpenCL/CUDA.
A core contains a limited amount of resources, memory and threads. Depending on how much each "thread" is going to use (therefore, depending on the code/kernel), the core will be able to run different amount of threads concurrently.
So the first unknown is: "How many threads a core can run?", It is unknown until the code is compiled, and different version of a compiler/driver can lead to different results.
If you don't know how many threads per core, then what use is for you knowning "6x? = ?". You still don't know how many threads can run in parallel and you will never will. Of course you can get the maximum value, but that may not always be like that, so what use does it for real aplications?
You have to think that a GPU is an unkown amount of very simple workers, that can only be put to the same task in groups of X.
The only important question is "How many threads are going to work in parallel in the same group?". Because you can do some clever cooperation techniques so those threads run faster together. And that is the "work group size".
The other parameters are simply redundant. Will just make your app faster or slower. Or allow you to run multiple tasks concurrently. But it should not be a design parameter.
The same as the CPU clock speed, or L1 cache is not a design parameter in CPU programing. Or how many other app are running.