New to Python, coming from MATLAB. My problem is very similar to this post ( Find the indices at which any element of one list occurs in another ), but with some tweaks that I can't quite manage to incorporate (i.e. managing duplicates and missing values).
Following that example, I have two lists, haystack and needles:
haystack = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K']
needles = ['F', 'G', 'H', 'I', 'F', 'K']
However, both haystack and needles are lists of dates. I need to create a list of indices in haystack for each element of needles in haystack such that:
result = [5, 6, 7, nan, 5, 9]
The two big differences between my problem and the posted example are: 1. I have duplicates in needles (haystack doesn't have any duplicates), which near as I can tell means I can't use set() 2. On rare occasion, an element in needles may not be in haystack, in which case I want to insert a nan (or other placeholder)
So far I've got this (which isn't efficient enough for how large haystack and needles are):
import numpy as np
def find_idx(a,func):
return [i for (i,val) in enumerate(a) if func(val)]
haystack = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K']
needles = ['F', 'G', 'H', 'I', 'F', 'K']
result=[]
for x in needles:
try:
idx = find_idx(haystack, lambda y: y==x)
result.append(idx[0])
except:
result.append(np.nan)
As far as I can tell, that code does what I want, but it's not fast enough. More efficient alternatives?