You can use a very naive approach of estimating the speed up by approximating the degree of parallelism in your data set (for example if it is ridiculously parallel this might be equal to the number of dimensions in your data-set), assuming best case implementation of concurrency with zero overhead of actually transferring the data, the math then becomes simple:
speedup=# of physical working items on GPU (usually # of CUDA cores in
typical nvidia implementation) divided by # of physical working items
on CPU (# of physical cores).
There are also differences which are not accounted for that have to do with different hardware architectures (ISR,design,etc.) These kind of calculations will vary greatly from real-life performance depending on your estimation of the models parallelism, implementation of concurrency and hardware.