The code snippet
cudaEventRecord(start, 0);
/* creates 1D FFT plan */
cufftPlan1d(&plan, NX, CUFFT_C2C, BATCH);
/* executes FFT processes */
cufftExecC2C(plan, devPtr, devPtr, CUFFT_FORWARD);
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
measures both the time required by the cuFFT to create a plan and the execution time.
How to measure only the execution time without including also the time needed for the creation of the plan?