I am trying to speed-up pymc3 sampling with parallelisation and I see only modest benefit.
I was able to decrease total running time from 25 minutes (njobs=1) to 13 minutes (njobs=6) on i7 MacBook Pro. Due to the fact that it takes about 4 minutes before pymc actually starts sampling, the increase is relatively small.
The question is - does anyone successfully using GPU with pymc3 and how much benefit can I get for models that take 6-8 minutes to sample? (My MacBook has nvidia GT 750M 2Gb)