scipy.optimize.minimize is too slow. How can I speed up

Question

I am converting an IDL code (written by Oleg Kochukhov) to Python. The code generates star surface map over spectral line profiles using Tikhonov or Maximum Entropy methods.

I use scipy.optimize.minimize to generate map over line profiles. But process is too slow and results is not compatible. I search solution on internet but i dont find any usefull solution.

I added a runnable code below:

import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import matplotlib.gridspec as gridspec

#syc = 0

def DI_GridInit(ntot):
    # generate stellar surface grid

    nlat = int(round(0.5 * (1.0 + np.sqrt(1.0 + np.pi * ntot))) - 1)
    nlon = np.zeros(nlat, dtype=int)
    xlat = np.pi * (np.arange(nlat, dtype=float) + 0.5) / nlat - np.pi / 2.0
    xcirc = 2.0 * np.cos(xlat[1:])
    nlon[1:] = np.around(xcirc * nlat) + 1
    nlon[0] = ntot - sum(nlon[1:])

    if abs(nlon[0] - nlon[nlat - 1]) > nlat:
        nlon[1:] = nlon[1:] + (nlon[0] - nlon[nlat - 1]) / nlat
    nlon[0] = ntot - sum(nlon[1:])

    if nlon[0] < nlon[nlat - 1]:
        nlon[1:] = nlon[1:] - 1
    nlon[0] = ntot - sum(nlon[1:])

    # generate Descartes coordinates for the surface grid in
    # stellar coordinates, areas of surface elements and
    # regularization indices: (lower, upper, right, left)

    x0, j = np.zeros((ntot, 3), dtype=float), 0
    latitude, longitude = np.zeros(ntot, dtype=float), np.zeros(ntot, dtype=float)
    sa, ireg = np.zeros(ntot, dtype=float), np.zeros((ntot, 4), dtype=int)
    slt = np.hstack((0., (xlat[1:nlat] + xlat[0:nlat - 1]) / 2. + np.pi / 2., np.pi))

    for i in range(nlat):
        coslat = np.cos(xlat[i])
        sinlat = np.sin(xlat[i])
        xlon = 2 * np.pi * (np.arange(nlon[i]) + 0.5) / nlon[i]
        sinlon = np.sin(xlon)
        coslon = np.cos(xlon)
        x0[:, 0][j:j + nlon[i]] = coslat * sinlon
        x0[:, 1][j:j + nlon[i]] = -coslat * coslon
        x0[:, 2][j:j + nlon[i]] = sinlat
        latitude[j:j + nlon[i]] = xlat[i]
        longitude[j:j + nlon[i]] = xlon
        sa[j:j + nlon[i]] = 2. * np.pi * (np.cos(slt[i]) - np.cos(slt[i + 1])) / nlon[i]
        ireg[:, 2][j:j + nlon[i]] = np.roll(j + np.arange(nlon[i], dtype=int), -1)
        ireg[:, 3][j:j + nlon[i]] = np.roll(j + np.arange(nlon[i], dtype=int), 1)

        if (i > 0):
            il_lo = j - nlon[i - 1] + np.arange(nlon[i - 1], dtype=int)
        else:
            il_lo = j + nlon[i] + np.arange(nlon[i + 1], dtype=int)

        if (i < nlat - 1):
            il_up = j + nlon[i] + np.arange(nlon[i + 1], dtype=int)
        else:
            il_up = il_lo

        for k in range(j, j + nlon[i]):
            dlat_lo = longitude[k] - longitude[il_lo]
            ll = np.argmin(abs(dlat_lo))
            ireg[k][0] = il_lo[ll]
            dlat_up = longitude[k] - longitude[il_up]
            ll = np.argmin(abs(dlat_up))
            ireg[k][1] = il_up[ll]

        j += nlon[i]

    theta = np.arccos(x0[:, 2])
    phi = np.arctan2(x0[:, 0], -x0[:, 1])
    ii = np.argwhere(phi < 0).T[0]
    nii = len(ii)
    phi[ii] = 2.0 * np.pi - abs(phi[ii]) if nii else None

    grid = {'ntot': ntot, 'nlat': nlat, 'nlon': nlon, 'xyz': x0, 'lat': latitude,
            'lon': longitude, 'area': sa, 'ireg': ireg, 'phi': phi, 'theta': theta}

    return grid

def DI_Map(grid, spots):
    map = np.ones(grid['ntot'], dtype=float)

    for i in range(spots['n']):
        dlon = grid['lon'] - np.deg2rad(spots['tbl'][i, 0])
        dlat = grid['lat'] - np.deg2rad(spots['tbl'][i, 1])
        da = (2.0 * np.arcsin(np.sqrt(np.sin(0.5 * dlat) ** 2 +
                                      np.cos(np.deg2rad(spots['tbl'][i, 1])) *
                                      np.cos(grid['lat']) * np.sin(0.5 * dlon) ** 2)))
        ii = np.argwhere(da <= np.deg2rad(spots['tbl'][i, 2])).T[0]
        ni = len(ii)

        map[ii] = spots['tbl'][i, 3] if ni > 0 else None

    return map


def DI_Prf(grid, star, map, phase=None, vv=None, vr=None, nonoise=None):

    # velocity array
    if vv is not None:
        nv = len(vv)
    else:
        nv = int(np.ceil(2.0 * star['vrange'] / star['vstep']))
        vv = -star['vrange'] + np.arange(nv, dtype=float) * star['vstep']

    # phase array
    if phase is None:
        phase = np.arange(star['nphases'], dtype=float) / star['nphases']

    # velocity correction for each phase
    vr = np.zeros(star['nphases'], dtype=float) if vr == None else None

    # fixed trigonometric quantities
    cosi = np.cos(np.deg2rad(star['incl'])); sini = np.sin(np.deg2rad(star['incl']))
    coslat = np.cos(grid['lat']); sinlat = np.sin(grid['lat'])

    # FWHM to Gaussian sigma
    sigm = star['fwhm'] / np.sqrt(8.0 * np.log(2.0))
    isig = (-0.5 / sigm ** 2)

    # initialize line profile and integrated field arrays
    prf = np.zeros((nv, len(phase)), dtype=float)

    # gradient if called with 5 - variable input
    grad = np.zeros((nv, len(phase), grid['ntot']), dtype=float)

    # phase loop
    for i in range(len(phase)):
        coslon = np.cos(grid['lon'] + 2.0 * np.pi * phase[i])
        sinlon = np.sin(grid['lon'] + 2.0 * np.pi * phase[i])
        mu = sinlat * cosi + coslat * sini * coslon
        ivis = np.argwhere(mu > 0.).T[0]
        dv = -sinlon[ivis] * coslat[ivis] * star['vsini']
        avis = grid['area'][ivis] * mu[ivis] * (1.0 - star['limbd'] + star['limbd'] * mu[ivis])

        if star['type'] == 0:
            wgt = avis * map[ivis]
            wgtn = sum(wgt)

            for j in range(nv):
                plc = 1.0 - star['d'] * np.exp(isig * (vv[j] + dv - vr[i]) ** 2)
                prf[j][i] = sum(wgt * plc) / wgtn
                grad[j][i][ivis] = avis * plc / wgtn - avis * prf[j][i] / wgtn

        elif star['type'] == 1:
            wgt = avis
            wgtn = sum(wgt)

            for j in range(nv):
                plc = 1.0 - map[ivis] * star['d'] * np.exp(isig * (vv[j] + dv - vr[i]) ** 2)
                prf[j][i] = sum(wgt * plc) / wgtn
                grad[j][i][ivis] = -wgt / wgtn * star['d'] * np.exp(isig * (vv[j] + dv - vr[i]) ** 2)

    # output structure
    syn = {'v': vv, 'phase': phase, 'prf': prf}

    # add noise
    if star['snr'] != -1 and nonoise != None:
        obs = syn['prf'] * 0.0

        for i in range(star['nphases']):
            obs[:, i] = syn['prf'][:, i] + np.random.standard_normal((len(syn['v']),)) / star['snr']

        syn['obs'] = obs

    return syn, grad

def DI_func(cmap, functargs):
    # global syc

    star = functargs['star']
    grid = functargs['grid']
    obs = functargs['obs']
    invp = functargs['invp']

    nv = len(obs['v'])
    er = 1.0 / abs(star['snr'])

    if 'vr' in obs.keys():
        syn, grad = DI_Prf(grid, star, cmap, phase=obs['phase'], vv=obs['v'], vr=obs['vr'])
    else:
        syn, grad = DI_Prf(grid, star, cmap, phase=obs['phase'], vv=obs['v'])

    # shf = 0
    # for i in range(len(obs['phase'])):
    #     plt.plot(obs['v'], obs['obs'][:, i] + shf, 'bo')
    #     plt.plot(obs['v'], syn['prf'][:, i] + shf, 'r')
    #     plt.plot(obs['v'], obs['obs'][:, i] - syn['prf'][:, i] + shf, 'k')
    #     shf += 0.1
    # plt.show()

    fchi = 0.0
    sign = (-1) ** invp['regtype']
    for i in range(star['nphases']):
        fchi = fchi + sign * sum((syn['prf'][:, i] - obs['obs'][:, i]) ** 2 / er ** 2) / nv

    freg = 0
    if invp['lambda'] > 0:
        if invp['regtype'] == 0:
            ir = grid['ireg']
            for k in range(len(ir[0, :])):
                freg = freg + invp['lambda'] / grid['ntot'] * sum((cmap - cmap[ir[:, k]]) ** 2)

        elif invp['regtype'] == 1:
            mmap = sum(cmap) / grid['ntot']
            nmap = cmap / mmap
            freg = freg - invp['lambda'] / grid['ntot'] * sum(nmap * np.log(nmap))

    ftot = fchi + freg

    syn['obs'] = obs['obs']

    # syc += 1
    # if syc % 1000 == 0:
    #     plotting(grid, cmap, syn, star['incl'], typ=star['type'])
    #
    # print(syc, ftot, sum(cmap))

    return ftot

def plotting(grid, map, syn, incl, typ):
    nlon = grid['nlon']

    nln = max(nlon)
    nlt = len(nlon)
    ll = np.zeros(nlt + 1, dtype=int)
    ll[0] = 0

    for i in range(nlt):
        ll[i + 1] = ll[i] + nlon[i]

    map1 = np.zeros((nlt, nln), dtype=float)
    x = np.arange(nln, dtype=float) + 0.5

    for i in range(nlt):
        lll = ((np.arange(nlon[i] + 2, dtype=float) - 0.5) * nln) / nlon[i]
        y = np.hstack((map[ll[i + 1] - 1], map[ll[i]:ll[i+1]-1], map[ll[i]]))

        for j in range(nln):
            imin = np.argmin(abs(x[j] - lll))
            map1[i, j] = y[imin]

    light = (190 * (map1 - np.min(map1)) / (np.max(map1) - np.min(map1))) + 50

    light_rect = np.flipud(light)

    if typ == 0:
        cmap = 'gray'
    else:
        cmap = 'gray_r'

    fig = plt.figure()
    fig.clear()
    spec = gridspec.GridSpec(ncols=3, nrows=3, left=0.10, right=0.98,
                             top=0.97, bottom=0.07, hspace=0.2, wspace=0.36)
    # naive IDW-like interpolation on regular grid
    shape = light.shape
    nrows, ncols = (shape[0], shape[1])
    lon, lat = np.meshgrid(np.linspace(0, 360, ncols), np.linspace(-90, 90, nrows))
    for i, item in enumerate([[(0, 0), -0], [(0, 1), -90], [(1, 0,), -180], [(1, 1), -270]]):
        ax = fig.add_subplot(spec[item[0]])
        # set up map projection
        m = Basemap(projection='ortho', lat_0=90 - incl, lon_0=item[1], ax=ax)
        # draw lat/lon grid lines every 30 degrees.
        m.drawmeridians(np.arange(0, 360, 30))
        m.drawparallels(np.arange(-90, 90, 30))
        # compute native map projection coordinates of lat/lon grid.
        x, y = m(lon, lat)
        # contour data over the map.
        m.contourf(x, y, light, 15, vmin=0., vmax=255., cmap=cmap)

        if i in [0, 2]:
            x2, y2 = m(180 - item[1], incl)
        else:
            x2, y2 = m(180 + item[1], incl)
        x1, y1 = (-10, 5)

        ax.annotate(str('%0.2f' % (abs(item[1]) / 360.)), xy=(x2, y2),  xycoords='data',
            xytext=(x1, y1), textcoords='offset points',
            color='r')

    ax5 = fig.add_subplot(spec[-1, :2])

    ax5.imshow(light_rect, vmin=0., vmax=255., cmap=cmap, interpolation='none', extent=[0, 360, -90, 90])
    ax5.set_xticks(np.arange(0, 420, 60))
    ax5.set_yticks(np.arange(-90, 120, 30))
    ax5.set_xlabel('Longitude ($^\circ$)', fontsize=7)
    ax5.set_ylabel('Latitude ($^\circ$)', fontsize=7)
    ax5.tick_params(labelsize=7)

    ax6 = fig.add_subplot(spec[0:, 2])
    shf = 0.0
    for i in range(len(syn['phase'])):
        ax6.plot(syn['v'], syn['obs'][:, -i - 1] + shf, 'bo', ms=2)
        ax6.plot(syn['v'], syn['prf'][:, -i - 1] + shf, 'r', linewidth=1)
        ax6.text(min(syn['v']), max(syn['obs'][:, -i - 1] + shf), str('%0.2f' % syn['phase'][-i - 1]),
                 fontsize=7)

        shf += 0.1

    p1 = ax6.lines[0]
    p2 = ax6.lines[-1]
    p1datay = p1.get_ydata()
    p1datax = p1.get_xdata()
    p2datay = p2.get_ydata()

    y1, y2 = min(p1datay) - min(p1datay) / 20.,max(p2datay) + min(p1datay) / 10.
    ax6.set_ylim([y1, y2])
    ax6.set_xlabel('V ($km s^{-1}$)', fontsize=7)
    ax6.set_ylabel('I / Ic', fontsize=7)
    ax6.tick_params(labelsize=7)
    max_ = int(max(p1datax))
    ax6.set_xticks([-max_, np.floor(-max_ / 2.), 0.0, np.ceil(max_ / 2.), max_])

    plt.show()

if __name__ == "__main__":

    # Star parameters
    star = {'ntot': 1876, 'type': 0, 'incl': 70, 'vsini': 50, 'fwhm': 7.0, 'd': 0.6,
            'limbd': 0.5, 'nphases': 5, 'vrange': np.sqrt(50 ** 2 + 7.0 ** 2) * 1.4,
            'vstep': 1.0, 'snr': 500}

    # Spot parameters
    lon_spot = [40, 130, 220, 310]
    lat_spot = [-30, 0, 60, 30]
    r_spot = [20, 20, 20, 20]
    c_spot = [0.1, 0.2, 0.25, 0.3]

    tbl = np.array([lon_spot, lat_spot, r_spot, c_spot]).T
    spots = {'n': len(lon_spot), 'type': star['type'], 'tbl': tbl}

    # Generate grid
    grid = DI_GridInit(star['ntot'])

    # Generate map
    cmap = DI_Map(grid, spots)

    # Generate spectral line profiles
    csyn, grad = DI_Prf(grid, star, cmap, nonoise=True)

    # Plotting map and line profiles
    plotting(grid, cmap, csyn, star['incl'], star['type'])

    # Generate map over the line profiles using scipy.optimize.minimize
    invp = {'lambda': 20, 'regtype': 0, 'maxiter': 10}
    grid_inv = DI_GridInit(star['ntot'])
    functargs = {'star': star, 'grid': grid_inv, 'obs': csyn, 'invp': invp}

    cmap = np.ones(star['ntot'])
    cmap[0] = 0.99

    bnd = list(zip(np.zeros(len(cmap), dtype=float), np.ones(len(cmap), dtype=float)))

    minimize(DI_func, cmap, args=functargs, method='TNC', bounds=bnd,
             callback=None, options={'eps': 0.1, 'maxiter': 5, 'disp': True})

The code includes followed parts.

'DI_GridInit' : Generates grids for the map

'DI_Map' : Generates star surface map according to starspot parameters (such as longitude, latitude, radius and contrast)

'DI_Prf' : Generates spectral line profiles according to map

Now I want to obtain the surface map over the generated and noised line profiles. I use scipy.optimize.minimize (TNC method) for obtain the surface map. I use 'DI_func' as function in minimize. But 'minimize' is so slow. What is the problem. How can I speed this up.

That's a lot of code! It will be helpful both for you and people here if you could reduce it to a minimal and runnable example (see [mcve]) — xdze2, Aug 24 '18 at 13:20
Thank you for your advice. I edited my post understandable as much as i can and I added a runnable example. — eng, Aug 25 '18 at 14:18

score 3 · Accepted Answer · answered Aug 25 '18 at 17:26

Here is a modified version of DI_Prf, where is the major computation time during the execution of DI_func:

def DI_Prf(grid, star, map, phase=None, vv=None, vr=None, nonoise=None):

    # velocity array
    if vv is not None:
        nv = len(vv)
    else:
        nv = int(np.ceil(2.0 * star['vrange'] / star['vstep']))
        vv = -star['vrange'] + np.arange(nv, dtype=float) * star['vstep']

    # phase array
    if phase is None:
        phase = np.arange(star['nphases'], dtype=float) / star['nphases']

    # velocity correction for each phase
    vr = np.zeros(star['nphases'], dtype=float) if vr == None else None

    # fixed trigonometric quantities
    cosi = np.cos(np.deg2rad(star['incl'])); sini = np.sin(np.deg2rad(star['incl']))
    coslat = np.cos(grid['lat']); sinlat = np.sin(grid['lat'])

    # FWHM to Gaussian sigma
    sigm = star['fwhm'] / np.sqrt(8.0 * np.log(2.0))
    isig = (-0.5 / sigm ** 2)

    # initialize line profile and integrated field arrays
    prf = np.zeros((nv, len(phase)), dtype=float)

    # gradient if called with 5 - variable input
    grad = np.zeros((nv, len(phase), grid['ntot']), dtype=float)

    # phase loop
    for i in range(len(phase)):
        coslon = np.cos(grid['lon'] + 2.0 * np.pi * phase[i])
        sinlon = np.sin(grid['lon'] + 2.0 * np.pi * phase[i])
        mu = sinlat * cosi + coslat * sini * coslon
        ivis = np.argwhere(mu > 0.).T[0]
        dv = -sinlon[ivis] * coslat[ivis] * star['vsini']
        avis = grid['area'][ivis] * mu[ivis] * (1.0 - star['limbd'] + star['limbd'] * mu[ivis])

        if star['type'] == 0:
            wgt = avis * map[ivis]
            wgtn = sum(wgt)

            #for j in range(nv):
            #    plc = 1.0 - star['d'] * np.exp(isig * (vv[j] + dv - vr[i]) ** 2)
            #    prf[j][i] = sum(wgt * plc) / wgtn
            #    grad[j][i][ivis] = avis * plc / wgtn - avis * prf[j][i] / wgtn

            plc = 1.0 - star['d'] * np.exp(isig * (vv[:, np.newaxis] + dv[np.newaxis, :] - vr[i]) ** 2)
            prf[:, i] = np.sum(wgt * plc, axis=1) / wgtn
            grad[:, i, ivis] = avis * plc / wgtn - (avis[:, np.newaxis]*prf[:, i]).T / wgtn

        elif star['type'] == 1:
            wgt = avis
            wgtn = sum(wgt)

            for j in range(nv):  # to be modified too
                plc = 1.0 - map[ivis] * star['d'] * np.exp(isig * (vv[j] + dv - vr[i]) ** 2)
                prf[j][i] = sum(wgt * plc) / wgtn
                grad[j][i][ivis] = -wgt / wgtn * star['d'] * np.exp(isig * (vv[j] + dv - vr[i]) ** 2)

    # output structure
    syn = {'v': vv, 'phase': phase, 'prf': prf}

    # add noise
    if star['snr'] != -1 and nonoise != None:

        #for i in range(star['nphases']):
        obs = syn['prf'] + np.random.standard_normal(size=syn['prf'].shape) / star['snr']

        syn['obs'] = obs

    return syn, grad

It reduces the time by 3:

%%timeit
syn, grad = DI_Prf(grid, star, cmap, phase=obs['phase'], vv=obs['v'])
# 127 ms ± 2.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# 40.7 ms ± 683 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

The main idea with Numpy is to not use loops, but work with multidimensional array, and use the broadcasting capabilities.

For instance:

fchi = 0.0
for i in range(star['nphases']):
    fchi = fchi + sign * sum((syn['prf'][:, i] - obs['obs'][:, i]) ** 2 / er ** 2) / nv

could be replaced with:

fchi = sign / nv / er ** 2 * np.sum( np.sum((syn['prf'] - obs['obs']) ** 2, axis=1 ) )

same for np.random.standard_normal(size=syn['prf'].shape)

It's not a big improvement here because star['nphases'] is small, but it is relatively important for the other axis. You could go further and remove the for loop over the phases in DI_Prf but it requires some thinking

Thank you very much your help. Your suggestion provided increasing performance. But it is still very slow than its IDL version. Code written in IDL complete minimization in one minute. Reason of this is IDL faster than Python? I wrote Python version of the IDL code similarly (IDL code use tnmin.pro for minimization) — eng, Aug 27 '18 at 10:25
It would be interesting to compare the execution time of `DI_func` between IDL and Python, to verify if the difference comes from the solver or from the implementation of the function... The TNC solver in scipy.minimize is actually written in C: see [tnc.py](https://github.com/scipy/scipy/blob/v1.1.0/scipy/optimize/tnc.py). Does the IDL code provide the gradient to the optimizer? — xdze2, Aug 27 '18 at 12:06
I am thinking of this now: `grad` is computed and returned by `DI_Prf` but not used in `DI_func`... something is missing or something can be removed... — xdze2, Aug 27 '18 at 12:22

scipy.optimize.minimize is too slow. How can I speed up

1 Answers1