I've recently had to implement a simple bruteforce software in python, and I was getting terrible execution times (even for a O(n^2) time complexity), topping the 10 minutes of runtime for a total of 3700 * 11125 * 2 = 82325000 access operations on numpy arrays (intel i5 4300U).
I'm talking about access operations because I initialized all the arrays beforehand (out of the bruteforce loops) and then I just recycle them by just overwriting without reallocating anything.
I get that bruteforce algorithms are supposed to be very slow, but 82 million acesses on contiguous-memory float arrays should not take 10 minutes even on the oldest of cpus...
After some research I found this post: Why is numpy.array so slow?, which lead me to think that maybe numpy arrays with length of 3700 are too small to overcome some sort of overhead (which I do not know since, as I said, I recycle and not reallocate my arrays), therefore I tried substituting arrays with lists and... voilà, run times were down to the minute (55 seconds) for a total of 10x decrease!
But now I'm kinda baffled on why on earth a list can be faster than an array, but more importantly, where is the threshold? The cited post said that numpy arrays are very efficient on large quantities of data, but where does "a lot of data" becomes "enough data" to actually get advantages? Is there a safe way to determine whether to use arrays or lists? Should I get worried about writing my functions two times, one for each case?
For anyone wondering, the original script was something like (list version, mockup data):
X = [0] * 3700 #list of 3700 elements
y = [0] * 3700
combinations = [(0, 0)] * 11125
grg_results = [[0, 0, 0]] * len(combinations)
rgr_results = [[0, 0, 0]] * len(combinations)
grg_temp = [100] * (3700 + 1)
rgr_temp = [100] * (3700 + 1)
for comb in range(len(combinations)):
pivot_a = combinations[comb][0]
pivot_b = combinations[comb][1]
for i in range(len(X)):
_x = X[i][0]
_y = y[i][0]
if _x < pivot_a:
grg_temp[i + 1] = _y * grg_temp[i]
rgr_temp[i + 1] = (2 - _y) * rgr_temp[i]
elif _x >= pivot_a and _x <= pivot_b:
grg_temp[i + 1] = (2 - _y) * grg_temp[i]
rgr_temp[i + 1] = _y * rgr_temp[i]
else:
grg_temp[i + 1] = _y * grg_temp[i]
rgr_temp[i + 1] = (2 - _y) * rgr_temp[i]
grg_results[comb][0] = pivot_a
grg_results[comb][1] = pivot_b
rgr_results[comb][0] = pivot_a
rgr_results[comb][1] = pivot_b
grg_results[comb][2] = metrics[0](grg_temp)
rgr_results[comb][2] = metrics[0](rgr_temp)