Okay here's my first Cython program below, the code to price European options on futures (Black Scholes without a dividend). It runs in 3.5s on 10M options, versus the code I posted below with straight numpy Python 3.25s. Can anyone point out why my Cython code is slower - like because I used a loop instead of vectorizing the call (not sure in C how to do that, the generated cython code appears to vectorize it though). Can I use nogil
and openmp
around this loop, even though the variables are passed in from numpy arrays? The simple examples posted on Cython's examples don't compile correctly with cython.parallel
prange
on the loop http://docs.cython.org/src/userguide/parallelism.html#module-cython.parallel. Feedback greatly appreciated, apologies for a somewhat open-ended question - others can use this code freely here as a starting point since it already works faster than other work profiled online that I've seen, in C and Python. Here it is:
Save as CyBlack.pyx
file to compile (note all inputs are float64
except Black_callput
which is int64
, 1 for a call, -1 for a put). After compiling, from CyBlack.CyBlack import CyBlack
:
from numpy cimport ndarray
cimport numpy as np
cimport cython
cdef extern from "math.h":
double exp(double)
double sqrt(double)
double log(double)
double erf(double)
cdef double std_norm_cdf(double x):
return 0.5*(1+erf(x/sqrt(2.0)))
@cython.boundscheck(False)
cpdef CyBlack(ndarray[np.float64_t, ndim=1] BlackPnL, ndarray[np.float64_t, ndim=1] Black_S, ndarray[np.float64_t, ndim=1] Black_Texpiry, ndarray[np.float64_t, ndim=1] Black_strike, ndarray [np.float64_t, ndim=1] Black_volatility, ndarray[np.float64_t, ndim=1] Black_IR, ndarray[np.int64_t, ndim=1] Black_callput):
cdef Py_ssize_t i
cdef Py_ssize_t N = BlackPnL.shape[0]
cdef double d1, d2
for i in range(N):
d1 = ((log(Black_S[i] / Black_strike[i]) + Black_Texpiry[i] * Black_volatility[i] **2 / 2)) / (Black_volatility[i] * sqrt(Black_Texpiry[i]))
d2 = d1 - Black_volatility[i] * sqrt(Black_Texpiry[i])
BlackPnL[i] = exp(-Black_IR[i] * Black_Texpiry[i]) * (Black_callput[i] * Black_S[i] * std_norm_cdf(Black_callput[i] * d1) - Black_callput[i] * Black_strike[i] * std_norm_cdf(Black_callput[i] * d2))
return BlackPnL
Here is the setup.py
so others can build this typing: python setup.py build_ext --inplace
built with VS2015 for Python 3.5 64bit Windows.
from setuptools import setup
from setuptools import Extension
from Cython.Distutils import build_ext
import numpy as np
ext_modules = [Extension("CyBlack",sources=["CyBlack.pyx"],
extra_compile_args=['/Ox', '/openmp', '/favor:INTEL64'],
language='c++')]
setup(
name= 'Generic model class',
cmdclass = {'build_ext': build_ext},
include_dirs = [np.get_include()],
ext_modules = ext_modules)
Okay and here is my very fast numpy Python only code:
import numpy as np
from scipy.stats import norm
d1=((np.log(Black_S / Black_strike) + Black_Texpiry * Black_volatility **2 / 2)) / (Black_volatility * np.sqrt(Black_Texpiry))
d2=d1 - Black_volatility * np.sqrt(Black_Texpiry)
BlackPnL = np.exp(-Black_IR * Black_Texpiry) * (Black_callput * Black_S * norm.cdf(Black_callput * d1) - Black_callput * Black_strike * norm.cdf(Black_callput * d2))