I'm currently working on a high-performance python 2.7 project utilizing lists ten thousands elements in size. Obviously, every operation must be performed as fast as possible.
So, I have two lists: One of them is a list of unique arbitrary numbers, let's call it A, and the other one is a linear list starting with 1 and with the same length as the first list, named B, which represents indices in A (starting with 1)
Something like enumerate, starting with 1.
For example:
A = [500, 300, 400, 200, 100] # The order here is arbitrary, they can be any integers, but every integer can only exist once
B = [ 1, 2, 3, 4, 5] # This is fixed, starting from 1, with exactly as many elements as A
If I have an element of B (called e_B) and want the corresponding element in A, I can simply do correspond_e_A = A[e_B - 1]
. No problem.
But now I have a huge list of random, non-unique integers, and I want to know the indices of the integers that are in A, and what the corresponding elements in B are.
I think I have a reasonable solution for the first question:
indices_of_existing = numpy.nonzero(numpy.in1d(random_list, A))[0]
What is great about this approach is that there is no need to map() single operations, numpy's in1d just returns a list like [True, True, False, True, ...]. Using nonzero() I can get the indices of the elements in random_list that exist in A. Perfect, I think.
But for the second question, I'm stumped. I tried something like:
corresponding_e_B = map(lambda x: numpy.where(A==x)[0][0] + 1, random_list))
This correctly gives me the indices, but it's not optimal, because firstly I need a map(), secondly I need a lambda, and finally numpy.where() does not stop after the item was found once (remember, A has only unique elements), meaning that it scales horribly with huge datasets like mine.
I took a look at bisect, but it seems bisect only works with single requests, not with lists, meaning that I'd still have to use map() and build my list elementwise (that's slow, isn't it?)
Since I'm quite new to Python, I was hoping anyone here might have an idea? Maybe a library I don't know yet?