I am new to cuda programming. I am working on Kepler GPU which has
3.2 compute_capability
1024 max_threads_per_block
1 Multiprocessor
2048 max._threads per_Multiprocessor
2147483647 grid size
Does this mean that I can only assign 2048 for a kernel ?. Then what to do with that large grid size?
My application includes some large no of matrix calculations.