How to accelerate this function using Numba?

Question

I was trying to optimize this function using Numba, but I am unable to do it. I think this has no part of the code which can be accelerated. If anyone can help me with an optimized version of this, My program would become blazing fast. Please tell if any dataset or other info is needed. When I apply direct @jit on this, It is not working.

def c_a(x, y, z, counter, p_l):
    # start = time.time()
    if counter == 1:
        l = x
        m = y
        n = z


        path = "c_r.pdb"
        global r_a_t


        p = Bio.PDB.PDBParser() 
        structure = p.get_structure('mSN1', path)
        c_r = [a.get_coord() for a in structure.get_atoms()]   
        lengthnew = len(c_r)


        m_d = np.array([-45, -45, -45])


        a_s_r = np.zeros((128, 128, 128), np.complex)
        for i in range(0, lengthnew):
            x = int(math.floor((c_r[i][0] - m_d[0]) / 1.2))
            y = int(math.floor((c_r[i][1] - m_d[1]) / 1.2))
            z = int(math.floor((c_r[i][2] - m_d[2]) / 1.2))
            with open("Ei.txt", 'r') as ei_values:
                for row in ei_values:
                    s_v = row.split()
                    if s_v[0] == r_a_t[i] :
                        a_s_r[x, y, z] = np.complex(s_v[1])


        n_n = lambda x, y, z : [(x2, y2, z2) for x2 in range(x - 5, x + 6)
                                   for y2 in range(y - 5, y + 6)
                                   for z2 in range(z - 5, z + 6)
                                   if (-1 < x < X and
                                       -1 < y < Y and
                                       -1 < z < Z and
                                       (x != x2 or y != y2 or z != z2) and
                                       (0 <= x2 < X) and
                                       (0 <= y2 < Y) and
                                       (0 <= z2 < Z) and
                                       ((( abs(x - x2)) ** 2 + (abs(y - y2)) ** 2 + (abs(z - z2)) ** 2  ) <= 25))]  
        m = n_n(l, m, n)
        result = 0
        for i in range(0, len(m)):
            a = m[i][0]
            b = m[i][1]
            c = m[i][2]
            result = result + a_s_r[a][b][c]
        return result

    else:
        l = x
        m = y
        n = z
        path = p_l


        global l_a_t


        p = Bio.PDB.PDBParser() 
        structure = p.get_structure('mSN1', path)
        c_l = [a.get_coord() for a in structure.get_atoms()]   
        lengthnew = len(c_l)


        m_d = np.array([-45, -45, -45])


        a_s_l = np.zeros((128, 128, 128), np.complex)
        for i in range(0, lengthnew):
            x = int(math.floor((c_l[i][0] - m_d[0]) / 1.2))
            y = int(math.floor((c_l[i][1] - m_d[1]) / 1.2))
            z = int(math.floor((c_l[i][2] - m_d[2]) / 1.2))
            with open("E.txt", 'r') as e_v:
                for row in e_v:
                    s_v = row.split()
                    if s_v[0] == l_a_t[i] :
                        a_s_l[x, y, z] = np.complex(s_v[1])


        n_n = lambda x, y, z : [(x2, y2, z2) for x2 in range(x - 5, x + 6)
                                       for y2 in range(y - 5, y + 6)
                                       for z2 in range(z - 5, z + 6)
                                       if (-1 < x < X and
                                           -1 < y < Y and
                                           -1 < z < Z and
                                           (x != x2 or y != y2 or z != z2) and
                                           (0 <= x2 < X) and
                                           (0 <= y2 < Y) and
                                           (0 <= z2 < Z) and
                                           (((abs(x - x2)) ** 2 + (abs(y - y2)) ** 2 + (abs(z - z2)) ** 2  ) <= 25))]  
        m = n_n(l, m, n)
        result = 0
        for i in range(0, len(m)):
            a = m[i][0]
            b = m[i][1]
            c = m[i][2]
            result = result + a_s_l[a][b][c]
        # print "c_a : ", time.time() - start    
        return result

A good place to start is to look at what numba can and cannot optimize, which can be found in the docs (http://numba.pydata.org/numba-doc/latest/reference/pysupported.html and http://numba.pydata.org/numba-doc/latest/reference/numpysupported.html). Then run your current implementation using profiling (cProfile and line_profiler) to find where the bottlenecks are. Then you can try to isolate the slow bits into Numba jitted functions. — JoshAdel, Jul 15 '18 at 13:54
@JoshAdel The looping part takes time but I am unable to jit it normally. — darkcodernavv, Jul 15 '18 at 17:16
You are going to have to break that large function apart. You can't read files within `nopython` numba functions, nor can you call functions from biopython or use lambda functions. You need to isolate the code that does a lot of looping just to things that operate on scalars and arrays. Take small pieces of the above function and test them with numba. I can give advice, but you're going to have to put in some work at experimenting with numba. — JoshAdel, Jul 16 '18 at 13:22

score 1 · Accepted Answer · answered Jul 21 '18 at 11:41

1

Solved.

Brought out all the file reading steps outside the function, as they were being executed many times. It gave a 70x boost.

Just left the lambda functions in the function as they are dependent on x, y & z.

answered Jul 21 '18 at 11:41

darkcodernavv

29
1
10

How to accelerate this function using Numba?

1 Answers1