Find the indices at which any element of one list occurs in another, with duplicates

Question

New to Python, coming from MATLAB. My problem is very similar to this post ( Find the indices at which any element of one list occurs in another ), but with some tweaks that I can't quite manage to incorporate (i.e. managing duplicates and missing values).

Following that example, I have two lists, haystack and needles:

haystack = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K']
needles = ['F', 'G', 'H', 'I', 'F', 'K']

However, both haystack and needles are lists of dates. I need to create a list of indices in haystack for each element of needles in haystack such that:

result = [5, 6, 7, nan, 5, 9]

The two big differences between my problem and the posted example are: 1. I have duplicates in needles (haystack doesn't have any duplicates), which near as I can tell means I can't use set() 2. On rare occasion, an element in needles may not be in haystack, in which case I want to insert a nan (or other placeholder)

So far I've got this (which isn't efficient enough for how large haystack and needles are):

import numpy as np

def find_idx(a,func):
    return [i for (i,val) in enumerate(a) if func(val)]

haystack = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K']
needles = ['F', 'G', 'H', 'I', 'F', 'K']

result=[]
for x in needles:
    try:
        idx = find_idx(haystack, lambda y: y==x)
        result.append(idx[0])
    except:
        result.append(np.nan)

As far as I can tell, that code does what I want, but it's not fast enough. More efficient alternatives?

this is a duplicate of [this question with a different title](https://stackoverflow.com/questions/4110059/pythonor-numpy-equivalent-of-match-in-r) — Onyambu, Jun 14 '19 at 00:04
the answer is simply `[ haystack.index(x) if x in haystack else None for x in needles ]` — Onyambu, Jun 14 '19 at 00:05

score 1 · Accepted Answer · answered Jun 14 '19 at 00:25

1

If your arrays are very large it may be worthwhile to make a dictionary to index the haystack:

haystack = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K']
needles  = ['F', 'G', 'H', 'I', 'F', 'K']

hayDict  = { K:i for i,K in enumerate(haystack) }
result   = [ hayDict.get(N,np.nan) for N in needles]

print(result)

# [5, 6, 7, nan, 5, 9]

answered Jun 14 '19 at 00:25

Alain T.

40,517
4
31
51

Thank you! All of the previous responses worked correctly, but this one proved by far the fastest. Thanks for all the great responses! – Eric Johnson Jun 17 '19 at 20:30

score 0 · Answer 2 · answered Jun 13 '19 at 23:20

How about this?

results=[]
haystack = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K']
needles = ['F', 'G', 'H', 'I', 'F', 'K']    

for n in needles:
    if n in haystack:
        results.append(haystack.index(n))
    else:
        results.append("NaN")
print (results)

or method2:

haystack = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K']
needles = ['F', 'G', 'H', 'I', 'F', 'K']

results=[]

def getInd(n, haystack):
        if n in haystack:
                return haystack.index(n)
        else:
                return "NaN"

for n in needles:
        results.append(getInd(n, haystack))

print (results)

Find the indices at which any element of one list occurs in another, with duplicates

2 Answers2