I need to calculate how many flops per transferred value a code should provide so that running the code on GPU will be worth enough to increase the performance.
Here are the flop rates and assumptions:
1. PCIe 16x v3.0 bus is able to transfer data from CPU to GPU at a rate of 15.75 GB/s.
2. GPU is able to perform 8 single precision TFLOPs/second.
3. CPU is able to perform 400 single precision GFLOPs/second.
4. Single precision floating point number is 4 bytes.
5. Calculation can overlap with data transfers.
6. Data is originally placed in the CPU.
How would a problem like this be solved step by step?