7

I am trying to optimize my code with Cython. It is doing a a power spectrum, not using FFT, because this is what we were told to do in class. I've tried to write to code in Cython, but do not see any difference. Here is my code

#! /usr/bin/env python
# -*- coding: utf8 -*-

from __future__ import division
cimport numpy as np
import numpy as np
cimport cython

@cython.boundscheck(False)
def power_spectrum(time, data, double f_min, double f_max, double df,w=1 ):

    cdef double com,f
    cdef double s,c,sc,cc,ss
    cdef np.ndarray[double, ndim=1] power
    cdef np.ndarray[double, ndim=1] freq

    alfa, beta = [],[] 
    m = np.mean(data)
    data -= m       

    freq = np.arange( f_min,f_max,df )
    for f in freq:
            sft = np.sin(2*np.pi*f*time)
            cft = np.cos(2*np.pi*f*time)
            s   = np.sum( w*data*sft )
            c   = np.sum( w*data*cft )
            ss  = np.sum( w*sft**2  )
            cc  = np.sum( w*cft**2  )
            sc  = np.sum( w*sft*cft )

            alfa.append( ( s*cc-c*sc )/( ss*cc-sc**2 ))
            beta.append( ( c*ss-s*sc )/( ss*cc-sc**2 ))
            com = -(f-f_min)/(f_min-f_max)*100
            print "%0.3f%% complete" %com

    power = np.array(alfa)**2 + np.array(beta)**2
    return freq,power,alfa,beta

The time and data is loaded via numpy.loadtxt and sent to this function. When I do

cython -a power_spectrum.pyx

the .html file is very yellow, so not very efficient. Especially the entire for-loop and the calulation of power and returning everything.

I have tried to read the official guide to Cython, but as I have never coded in C it is somewhat hard to understand.

All help is very preciated :)

jfs
  • 399,953
  • 195
  • 994
  • 1,670

1 Answers1

4

Cython can read numpy arrays according to this but it won't magically compile stuff like np.sum - you are still just calling the numpy methods.

What you need to do is re-write your inner loop in pure cython which can then compile it for you. So you will need to re-implement np.sum, np.sin etc. Pre-allocating aplfa and beta is a good idea so you don't use append and try to cdef as many variables as possible.

EDIT

Here is an complete example showing the inner loop completely C compiled (no yellow). I've no idea whether the code is correct but it should be a good starting point! In particular note use of cdef everywhere, turning on cdivision and import of sin and cos from the standard library.

from __future__ import division
cimport numpy as np
import numpy as np
cimport cython
from math import pi

cdef extern from "math.h":
    double cos(double theta)
    double sin(double theta)

@cython.boundscheck(False)
@cython.cdivision(True)
def power_spectrum(np.ndarray[double, ndim=1] time, np.ndarray[double, ndim=1] data, double f_min, double f_max, double df, double w=1 ):

    cdef double com,f
    cdef double s,c,sc,cc,ss,t,d
    cdef double twopi = 6.283185307179586
    cdef np.ndarray[double, ndim=1] power
    cdef np.ndarray[double, ndim=1] freq = np.arange( f_min,f_max,df )
    cdef int n = len(freq)
    cdef np.ndarray[double, ndim=1] alfa = np.zeros(n)
    cdef np.ndarray[double, ndim=1] beta = np.zeros(n)
    cdef int ndata = len(data)
    cdef int i, j

    m = np.mean(data)
    data -= m       

    for i in range(ndata):
        f = freq[i]

        s = 0.0
        c = 0.0
        ss = 0.0
        cc = 0.0
        sc = 0.0
        for j in range(n):
            t = time[j]
            d = data[j]
            sf = sin(twopi*f*t)
            cf = cos(twopi*f*t)
            s += w*d*sf
            c += w*d*cf
            ss += w*sf**2
            cc += w*cf**2
            sc += w*sf*cf

        alfa[i] = ( s*cc-c*sc )/( ss*cc-sc**2 )
        beta[i] = ( c*ss-s*sc )/( ss*cc-sc**2 )

    power = np.array(alfa)**2 + np.array(beta)**2
    return freq,power,alfa,beta
Nick Craig-Wood
  • 52,955
  • 12
  • 126
  • 132
  • Thank you for your reply. Can you give an example on how to re-implemente e.g. `np.sum` in the pure cython code? I am not sure how I can come around not using `append` when `f` is not an integer. – Daniel Thaagaard Andreasen May 27 '12 at 11:10
  • `alfa` and `beta` have the same length as `freq` so you can define them in the same way then fill them up with zeros outside the loop. Re-arrange your loop to loop over an index `i` say which goes from `0..len(freq)-1` and use that to read `freq` and store `alfa` and `beta`. As for `np.sum` - it is a loop over `sft` and `data` multiplying point-wise then summing. If you are coding this all in cython then you'd combine the first 8 lines of your inner loop into a single for loop over the indexes of the `time` array. – Nick Craig-Wood May 27 '12 at 11:44
  • Here is an example dft in python: http://numericalrecipes.wordpress.com/2009/04/30/the-discrete-fourier-transform/- note the nested `for`s in the inner loop. In fact that code is pretty much want you want for cython, give or take some `cdef` statements – Nick Craig-Wood May 27 '12 at 11:50
  • While it was easy to make the for loop run over `i`, I can still not see how to make the sum in a smart and fast way. The `alfa`, `beta` and `power` is now less yellow in the .html file, but still as slow as pure python. – Daniel Thaagaard Andreasen May 27 '12 at 15:12
  • Have added a full example of what I mean - hope that helps! – Nick Craig-Wood May 27 '12 at 16:07
  • Wow, finally get the speed I wanted. Thanks a lot. It seems that it is very important to define `2*pi` outside the loop in a constant. Also `n` and `ndata` should change, otherwise everything is now good. On a simple example I went from ~10s to ~4s. It means a lot later on. Thank you Nick :) – Daniel Thaagaard Andreasen May 27 '12 at 18:03
  • No probs.. the stack overflow way to show your appreciation is to accept the answer &or upvote! – Nick Craig-Wood May 27 '12 at 19:35
  • 1
    Here you go :) Have a nice day. – Daniel Thaagaard Andreasen May 28 '12 at 08:00