GF/s or GFLOPS is GigaFlops or 10^9 FLoating Operations Per Second. (GF/s is bit unusual abbreviation of GigaFLOP/S = GigaFLOPS, see e.g. here "Gigaflops (GF/s) = 10^9 flops" or here "gigaflops per second (GF/s)").
And it is clear for me that GF/s is not GFLOPS/s (not an acceleration).
You should remember that floating operation on CPU and on GPU usually counted in different way. For most CPU, 64-bit floating point format operations are counted usually. And for GPU - 32 bit, because GPU have much more performance in 32bit floating point.
What types of operations are counted? Addition, subtraction and multiplication are. Loading and storing data are not counted. But loading and storing data is necessary to get data from/to memory and sometimes it will limit FLOPS achieved in real application (the article you cited says about this case, "memory bandwidth limited application", when CPU/GPU can deliver lot of FLOPS but memory can't read needed data so fast)
How FLOPS are counted for some chip or computer? There are two different metrics, one is for theoretical upper limit of FLOPS for this chip. It is counted by multipliing cores number, frequency of chip and floating point operations per CPU tick (it was 4 for Core2 and is 8 for Sandy Bridge CPUs).
Other metric is something like real-world flops, which are counted by running LINPACK benchmark (solving a huge linear system of equations). This benchmark uses matrix-matrix multiplication a lot and is kind of approximation of real-world flops. Top500 of supercomupters are measured by parallel version of LINPACK banchmark, the HPL. For single CPU, linpack can have up to 90-95% of theoretical flops, and for huge clusters it is in 50-85% range.