I am trying to improve a part of code that is slowing down the whole script significantly, right to the point of making it unfeasible. In particular the piece of code is:
for vectors1 in EC1:
for vectors2 in EC2:
r = np.add(vectors1, vectors2)
for vectors3 in CDC:
result = np.add(r, vectors3).tolist()
if result not in states: # This is what makes it very slow
states.append(result)
EC1, EC2 and CDC are lists that contains as elements, lists of lists, as an example of one iteration, we get:
vectors1: [[2, 0, 0], [0, 0, 0], [0, 0, 0], [2, 0, 0], [0, 0, 0], [0, 0, 0], [2, 0, 0], [2, 0, 0], [0, 0, 0]]
vectors2: [[0, 0, 0], [2, 0, 0], [0, 0, 0], [0, 0, 0], [2, 0, 0], [2, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
vectors3: [[0, 0, 0], [0, 0, 0], [2, 1, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [2, 1, 0], [2, 1, 0]]
result: [[2, 0, 0], [2, 0, 0], [2, 1, 0], [2, 0, 0], [2, 0, 0], [2, 0, 0], [2, 0, 0], [4, 1, 0], [2, 1, 0]]
Notice how vectors1, vectors2 and vectors3 correspond to one element from EC1, EC2 and CDC respectively, also how 'result' is the summation from vectors1, vectors2 and vectors3, hence the previous vectors cannot be altered in any manner or sorted, otherwise it would change the expected result from the 'result' variable.
In the first two loops each item in EC1 and EC2 are summed, for later on sum up the previous result with items in CDC. To sum the list of lists from EC1 and EC2 and later on the previous result ('r') with the list of lists from CDC I use numpy.add(). Finally, I reconvert 'result' back to list. So Basically I am managing lists of lists as elements from EC1, EC2 and CDC.
The problem is that I must deal with hundreds of thousands (close to 1M) of results and having to check if a result exists in states list is slowing things drastically, specially since states list grows as more results are processed.
I've tried to keep inside the numpy world by managing everything as numpy arrays. First declaring states as:
states = np.empty([9, 3], int)
Then, concatenating the result numpy array to states numpy array, prior checking if already exists in states:
for vectors1 in EC1:
for vectors2 in EC2:
r = np.add(vectors1, vectors2)
for vectors3 in CDC:
result = np.add(r, vectors3)
if not np.isin(states, result).any():
np.concatenate(states, result, axis=0)
But definitely I am doing something wrong because result is not being concatenated to states, I've also tried without success:
np.append(states, result, axis=0)
Could this be parallelized in some way?