I've been working on an international trade model and the model has gotten really slow (sometimes taking weeks at a time to finish). Mostly, there was a big for loop that was slowing the process down, as shown here (in Python):
for s in xrange(self.S-1):
c_matrix[:,s+1,s+1:self.T+s+1] = ((self.beta * (1-self.MortalityRates[:,s,s:self.T+s]) * (1 + r_path[s+1:self.T+s+1] - self.delta)\
* psi[:,s+1,s+1:self.T+s+1])/psi[:,s,s:self.T+s])**(1/self.sigma) * c_matrix[:,s,s:self.T+s]*np.exp(-self.g_A)
a_matrix[:,s+1,s+1:self.T+s+1] = ( (we[:,s,s:self.T+s] + (1 + r_path[s:self.T+s] - self.delta)*a_matrix[:,s,s:self.T+s] + bqvec_path[:,s,s:self.T+s])\
-c_matrix[:,s,s:self.T+s]*(1+we[:,s,s:self.T+s]*(self.chi/we[:,s,s:self.T+s])**self.rho) )*np.exp(-self.g_A)
I won't go into too much detail about each piece, but I decided to make a Fortran module that, in theory, should do the exact same thing but should run significantly faster. I'm using Cython and distutils to wrap the Fortran module. Here is the .f90 file (focusing specifically on the a_matrix part of the for loop above) for your reference:
MODULE FILLA
USE ISO_C_BINDING, ONLY: C_DOUBLE, C_INT
IMPLICIT NONE
CONTAINS
SUBROUTINE
FILLASSETS(A_MATRIX,WE,R_PATH,BQ_VECPATH,C_MATRIX,I,S,T,DELTA,CHI,RHO,G_A) BIND(C)
INTEGER(C_INT), INTENT(IN) :: I,S,T
INTEGER :: X, E, L
REAL(C_DOUBLE), INTENT(IN) :: DELTA, CHI, RHO, G_A
REAL(C_DOUBLE), DIMENSION(0:I-1,0:S-1,0:T+S-1),INTENT(IN) :: C_MATRIX, WE, BQ_VECPATH
REAL(C_DOUBLE), DIMENSION(0:T+S-1), INTENT(IN) :: R_PATH
REAL(C_DOUBLE), DIMENSION(0:I-1,0:S,0:T+S-1), INTENT(INOUT) :: A_MATRIX
REAL(C_DOUBLE), DIMENSION(0:I-1,0:S-1,0:T+S-1):: R_PATHM
DO E = 0, I-1
DO L = 0, S-1
R_PATHM(E,L,:)=R_PATH(:)
ENDDO
ENDDO
DO X = 0, S-2
A_MATRIX(:,X+1,X+1:T+X+1) = ((WE(:,X,X:T+X)+(1.0D0+R_PATHM(:,X,X:T+X)-DELTA)&
&*A_MATRIX(:,X,X:T+X)+BQ_VECPATH(:,X,X:T+X))&
&-C_MATRIX(:,X,X:T+X)*(1.0D0+WE(:,X,X:T+X)*(CHI/WE(:,X,X:T+X))&
&**RHO))*EXP(-G_A)
ENDDO
END SUBROUTINE FILLASSETS
END MODULE
And this .pyx file is how the arrays from python get passed into Fortran:
from numpy cimport ndarray as ar
cdef extern from "filla.h":
void fillassets(double* A_Matrix, double* we, double* r_path, double* bq_vecpath, double* c_matrix, int* I, int* S, int* T, double* delta, double* chi, double* rho, double* g_A)
cpdef f_filla(ar[double, ndim=3] A_Mat, ar[double, ndim=3] w_e, ar[double, ndim=1] r_path, ar[double, ndim=3] bq_vecpath, ar[double,ndim=3] c_mat, double delta, double chi, double rho, double g_A):
cdef int I,S,T,Q
if A_Mat.flags['C_CONTIGUOUS'] and w_e.flags['C_CONTIGUOUS'] and r_path.flags['C_CONTIGUOUS'] and bq_vecpath.flags['C_CONTIGUOUS'] and c_mat.flags['C_CONTIGUOUS']:
I=c_mat.shape[0]
S=c_mat.shape[1]
Q=c_mat.shape[2]
T=Q-S
fillassets(&A_Mat[0,0,0],&w_e[0,0,0],&r_path[0],&bq_vecpath[0,0,0],&c_mat[0,0,0],&I,&S,&T,&delta,&chi,&rho,&g_A)
else:
raise ValueError("Input array U is not C-contiguous")
So I've set it up without any errors, but when I run them side by side, the Fortran module returns incorrect values in the asset matrix, for some reason. I think it's something relating to how Fortran and Python handle arrays, but I haven't found a straightforward explanation and hope that I could find some help here. I'm very unfamiliar with Fortran and so it's probably something simple that I'm missing. I tried to keep this as brief as possible, so if there are details I've left out, let me know.
EDIT: I'm using Mac OS X El Capitan and the gfortran complier.