4

I have been translating some code from Matlab to Python that we use to analyse data in our lab. We have two lists of time stamps and we want to use one to herald the other: for every element in the first list we look for time stamps in the second list that have a precise separation in time. In case there are, we place these in a separate list.

Here is an runnable example of the kind of Matlab code I am using, with random data. It is probably VERY crude, as I am not well versed in Matlab. In the following Ctrigger is the trigger list, and Csignal is the signal list that we want to herald. For every element of Ctrigger we look if there are elements in Csignal that are within a window centred on offset, and with width gate. The selected events will be placed in Hsignal.

% Matlab code

Ctrigger = linspace(0, 3000000, (3000000-1)/3);
length_t = length(Ctrigger);

Bsignal = linspace(0, 3000000, (3000000-1)/10);
length_s = length(Bsignal);
noise = reshape(20*rand(length_s,1)-10,[1,length_s]);
Csignal = Bsignal + noise;

offset = 3;
gate = 1;

Hsignal=zeros(length_s,1);
marker = 1;

tic
for j=1:length_t-1
    m = marker;
    tstart=Ctrigger(j)+offset-gate/2;
    tstop=Ctrigger(j)+offset+gate/2;
    while(m <= length_s-1)
        if(Csignal(m)<tstart)
            marker=m;
            m=m+1;
        end
        if(Csignal(m)>=tstart && Csignal(m)<=tstop)
            Hsignal(m)=Csignal(m);
            m = m+1;
        end
        if(Csignal(m)>tstop)
            break;
        end
    end
end

toc

Hsignal=Hsignal(Hsignal~=0);
Hsignal = unique(Hsignal);

Roughly 90'000 events are selected to be placed in Hsignal, and Matlab takes about 0.05 seconds to run this. I have introduced the marker counter because the two lists Csignal and Ctrigger area already ordered in time. marker is set at the start of one heralding window: when I move to the next trigger I will not look again in all of Csignal, but only from the start of that window. To avoid a double count, I remove the duplicates at the end.

If you want to have an idea of the code, here is a simplified version of the input and output:

Ctrigger = [1, 10, 11, 20, 30, 40, 50, 60]
Csignal = [4, 11, 13, 17, 25, 34, 41, 42, 50, 57, 65]
print(Hsignal)
# [4, 11, 13, 41, 42]

Now, I have copied this code from Matlab, just slightly adjusting it to fit into python. Following some advice I first declare the function that contains the main algorithm, and then call it:

# Python code

def main(list1, list2, list3, delay, window):
    marker = 1
    for j in range(len(list1)):
        m = marker
        t_star = list1[j] + delay - window/2
        t_sto = list1[j] + delay + window/2   
        while m < len(list2):   
            if (list2[m] < t_star):
                marker = m
                m = m + 1
            elif (list2[m] >= t_star and list2[m] <= t_sto):
                list3[m] = list2[m]
                m = m + 1
            elif (list2[m] > t_sto):
                break


Ctrigger = range(0, 3000000, 3)
length_t = len(Ctrigger)

Bsignal = range(0, 3000000, 10)
length_s = len(Bsignal)
noise = 1e-05*np.asarray(random.sample(range(-1000000,1000000), int(length_s)))
Csignal = list(np.sort(np.asarray(Bsignal) + noise))

offset = 3
gate = 1

length_t = len(Ctrigger)
length_s = len(Csignal)
Hsignal = list(np.zeros(len(Ctrigger)))

start = time.time()

main(Ctrigger, Csignal, Hsignal, offset, gate)

end = time.time()
Hsignal = np.sort(np.asarray(list(set(Hsignal))))

print(end-start)

Similarly, about 90'000 elements are placed in Hsignal. The key problem is that python takes about 1.1 seconds to run this! I have even tried with this alternative, that removes some loops (here I still use arrays, as I have to add elements to an entire list):

start = time.time()
result = list()
for event in Ctrigger:
    c = Csignal - event - offset
    d = Csignal[abs(c) <= gate/2]
    result.append(list(d))


flat = [item for sublist in result for item in sublist]
flat = np.sort(np.asarray(list(set(flat))))

end = time.time()
print(end-start)

but it's even worse, almost 10 minutes.

I can't really understand where the problem is. For my application Ctrigger is 100e06 long, and Csignal around 20e06. In matlab the same code takes 1.06 seconds, against more than 10 minutes in python. It also seems that it is not straightforward to remove the loops and speeding the process at the same time.

EDIT I: I have introduced the Matlab code I am using, as well as an executable example. I also made Hsignal a list, while Ctrigger and Csignal are still arrays. Result: 0.05s vs 6.5s

EDIT II: now I only use lists, as suggested by RiccardoBucco. Result: 0.05s vs 1.5s

EDIT III: instead of appending to Hsignal I am declaring it first, then changing individual elements, which I noticed brought a small speed up (even though it seems that keeping Hsignal as an array is faster!). Then I declared a function with the main algorithm. Result: 0.05s vs 1.1s

  • 2
    Could you also post a simple input and the related expected output? So that it is easier to understand the problem – Riccardo Bucco Dec 28 '19 at 11:40
  • Sure, I have edited the code to include an example. Thanks!! – Samuele Grandi Dec 28 '19 at 15:19
  • You might want to post also the relevant part of your matlab code. – Riccardo Bucco Dec 28 '19 at 17:27
  • I don't see anything in the code which could result in such a huge difference. Can you share the complete example for matlab vs python with us, including some (potentially random) input data wich actually allows us to observe the differences? – Daniel Dec 28 '19 at 17:52
  • @Daniel I have edited the reply, including the matlab code and random input data – Samuele Grandi Dec 29 '19 at 14:50
  • 1
    As @Riccardo suggests, using lists is more efficient than using numpy arrays. For example, you can replace `Ctrigger = np.asarray(range(0, 3000000, 3))` by `Ctrigger = list(range(0, 3000000, 3))`. This gets the execution time from 5.5s to 1.4s! This is because individual element access in a list is faster than in a numpy array. For more info, see [here](https://stackoverflow.com/questions/29281680/numpy-individual-element-access-slower-than-for-lists) – Ugurite Dec 29 '19 at 21:39
  • Please stop using numpy arrays, and do tests *only* with lists, as I already suggested. You should not use arrays to implement Csignal and Ctrigger too – Riccardo Bucco Dec 29 '19 at 22:21
  • @RiccardoBucco sorry Riccardo, I only modified Hsignal, as that was your original suggestion, and didn't realise it would still bring a slow down. However, I still have a factor of 20 between matlab and python, only for the data analysis. – Samuele Grandi Dec 30 '19 at 15:36
  • @Ugurite Thanks, I have done that. Still, I have a factor of 20 to get rid of. – Samuele Grandi Dec 30 '19 at 15:36
  • That difference is absolutely normal – Riccardo Bucco Dec 30 '19 at 16:10
  • @RiccardoBucco do you mean that it is unsolvable? That Matlab will always be orders of magnitude faster at handling long lists? – Samuele Grandi Dec 31 '19 at 10:25
  • @Ugurite I have now a measure of that: If Ctrigger is an array the program takes 697 seconds, and 614 if it is a list. – Samuele Grandi Jan 07 '20 at 11:31
  • 1
    1) Use simple arrays as you have done in your Matlab-impelementation (only numpy arrays and scalars). 2) Apply a jit-compiler (Numba) or use Cython. Example: https://stackoverflow.com/a/51911200/4045774 Matlab uses jit-compilation by default, Python doesn't. Because of this pure not compiled loops are very slow in Python.Also note that the first call has a constant overhead due to compilation. But if you write a Matlab-function and call it multiple times, timings may also decrease on the second call. – max9111 Jan 13 '20 at 10:14

2 Answers2

3

What is probably slowing down your algorithm is the use of np.append in

Hsignal = np.append(Hsignal, Csignal[m])

You should use a list, not a NumPy array:

Ctrigger = [1, 10, 11, 20, 30, 40, 50, 60]
Csignal = [4, 11, 13, 17, 25, 34, 41, 42, 50, 57, 65]

offset = 2
gate = 2

Hsignal = []
marker = 0

for j in range(len(Ctrigger)):
    m = marker
    t_start = Ctrigger[j] + offset - gate/2
    t_stop = Ctrigger[j] + offset + gate/2   
    while m < len(Csignal):   
        if Csignal[m] < t_start:
            marker = m
            m = m + 1
        elif Csignal[m] <= t_stop:
            Hsignal.append(Csignal[m])
            m = m + 1
        else:
            break

Hsignal = sorted(set(Hsignal))

Once the list has been built, you can transform it into an array:

Hsignal = np.array(Hsignal)
Riccardo Bucco
  • 13,980
  • 4
  • 22
  • 50
  • Unfortunately it is not the case - I have tried and there no consistent speed up. I was using NumPy arrays as I needed to do some manipulations on the lists before, on a separate code where I have something similar I use lists, and the difference with Matlab is still staggering. – Samuele Grandi Dec 28 '19 at 17:23
  • @SamueleGrandi You tried my method and it does not speed up the algorithm? Are you sure? `np.append` is much much slower than using `list.append` – Riccardo Bucco Dec 28 '19 at 17:25
  • yes I tried. I didn't let the code finish as it was already taking over several minutes. So even if it did speed up, it was not the bottleneck – Samuele Grandi Dec 29 '19 at 10:48
3

How to get the runtime down to 6ms

As you already have seen Python loops are extremely slow. Per default there is no jit-Compiler which speeds up loops as in Matlab. So you have following possibilities:

  • Vectorize your code in Numpy, if possible.
  • Use Cython to compile the function
  • Use Numba to compile the function

In the following example I use Numba, because it is really simple to use in such cases.

Example

import numpy as np
import numba as nb

@nb.njit()
def main_nb(Ctrigger, Csignal, offset, gate):
    Hsignal = np.zeros(Ctrigger.shape[0])

    marker = 1
    for j in range(Ctrigger.shape[0]):
        m = marker
        t_star = Ctrigger[j] + offset - gate/2
        t_sto = Ctrigger[j] + offset + gate/2   
        while m < Csignal.shape[0]:   
            if (Csignal[m] < t_star):
                marker = m
                m = m + 1
            elif (Csignal[m] >= t_star and Csignal[m] <= t_sto):
                Hsignal[m] = Csignal[m]
                m = m + 1
            elif (Csignal[m] > t_sto):
                break
    return Hsignal

Also note to avoid Lists if possible. Use simple arrays like you would do in Matlab.

Timings

import time

#Use simple numpy arrays if possible, not lists
Ctrigger = np.arange(0, 3000000, 3)
length_t = Ctrigger.shape[0]

Bsignal = np.arange(0, 3000000, 10)
noise = 1e-05*np.random.rand(Bsignal.shape[0])
Csignal = np.sort(np.asarray(Bsignal) + noise)

offset = 3
gate = 1

start = time.time()
Hsignal=main(Ctrigger, Csignal, offset, gate)
print("Pure Python takes:" +str(time.time()-start))
#Pure Python takes:6.049151659011841

#First call takes longer (compilation overhead)
#The same may be the case in matlab
start = time.time()
Hsignal=main_nb(Ctrigger, Csignal, offset, gate)
print("First Numba run takes:" +str(time.time()-start))
#First Numba run takes:0.16272664070129395

start = time.time()
Hsignal=main_nb(Ctrigger, Csignal, offset, gate)
print("All further Numba calls run takes:" +str(time.time()-start))
#All further Numba calls run takes:0.006016731262207031

Hsignal = np.unique(Hsignal)
Community
  • 1
  • 1
max9111
  • 6,272
  • 1
  • 16
  • 33