2

I got my code for simulating a multivariate regression model to work using the Differential Evolution, and even got the multiprocessing option to help out in reducing runtime. However, with 7 independent variables with 10 values each and matrix operations on 21 100+ element matrices takes a bit of time to work on 24 cores. I don't have much experience with multiprocessing with PyOpenCL, so I wanted to ask if it's worth entering into and trying to integrate the two to work on the GPU. I've attached the code snippet of 3 variables and 3 values for reference:

import scipy.optimize as op
import numpy as np

def func(vars, *args):
    res = []
    x = []
    for i in args[1:]:
        if len(res) + 1 > len(args)//2:
            x.append(i)
            continue
        res.append(np.array(i).T)
    
    f1 = 0
    for i in range(len(x[0])):
        for j in range(len(x[1])):
            diff = (vars[0]*x[0][i] + vars[1])*(vars[2]*x[1][j]*x[1][j] + vars[3]*x[1][j] + vars[4])*(vars[5]*50*50 + vars[6]*50 + vars[7])
            f1 = f1 + abs(res[0][i][j] - diff) # ID-Pitch
    
    f2 = 0
    for i in range(len(x[0])):
        for j in range(len(x[2])):
            diff = (vars[0]*x[0][i] + vars[1])*(vars[5]*x[2][j]*x[2][j] + vars[6]*x[2][j] + vars[7])*(vars[2]*10*10 + vars[3]*10 + vars[4])
            f2 = f2 + abs(res[1][i][j] - diff) # ID-Depth
    
    f3 = 0
    for i in range(len(x[1])):
        for j in range(len(x[2])):
            diff = (vars[2]*x[1][i]*x[1][i] + vars[3]*x[1][i] + vars[4])*(vars[5]*x[2][j]*x[2][j] + vars[6]*x[2][j] + vars[7])*(vars[0]*3.860424005 + vars[1])
            f3 = f3 + abs(res[2][i][j] - diff) # Pitch-Depth
    return f1 + f2 + f3


def main():
    res1 = [[134.3213274,104.8030828,75.28483813],[151.3351445,118.07797,84.82079556],[135.8343927,105.9836392,76.1328857]]
    res2 = [[131.0645086,109.1574174,91.1952225],[54.74920444,30.31300092,17.36537062],[51.8931954,26.45139822,17.28693162]]
    res3 = [[131.0645086,141.2210331,133.3192429],[54.74920444,61.75898314,56.52756593],[51.8931954,52.8191817,52.66531712]]
    x1 = np.array([3.860424005,7.72084801,11.58127201])
    x2 = np.array([10,20,30])
    x3 = np.array([50,300,500])
    interval = (-20,20)
    bds = [interval,interval,interval,interval,interval,interval,interval,interval]
    res = op.differential_evolution(func, bounds=bds, workers=-1, maxiter=100000, tol=0.01, popsize=15, args=([1,2,2], res1, res2, res3, x1, x2, x3))
    print(res)
    
if __name__ == '__main__':
    main()
ManharG
  • 21
  • 2

1 Answers1

2

firstly, yes it's possible, and func can be a function that will send the data to the GPU then wait for the computationts to finish then transfer the data back to the RAM and return it to scipy.

changing computations from CPU to GPU side is not always beneficial, because of the time required to transfer data back and forth from the GPU, so with a moderate laptop GPU for example, you won't get any speedup at all, and your code might be even slower. reducing data transfer between the GPU and RAM can make GPU 2-4 times faster than an average CPU, but your code requires data transfer so that won't be possible.

for powerful GPUs with high bandwidth (things like RTX2070 or RTX3070 or APUs) you can expect faster computations, so computations on GPU will be a few times faster than CPU, even with the data transfer, but it depends on the code implementation of both the CPU and GPU code.

lastly, your code can be sped up without the use of GPU, which is likely the first thing you should do before going for GPU computations, mainly by using code compilers like cython and numba, that can speed up your code by almost 100 times with little effort without major modifications, but you should convert your code to use only fixed size preallocated numpy arrays and not lists, as the code will be much faster and you can even disable the GIL and have your code multithreaded, and there are good multithreaded looping implementations in them.

Ahmed AEK
  • 8,584
  • 2
  • 7
  • 23