0

I have two lists who are generated, one is a simple list of words, and another a list of lists of words. What would be the fastest way to check if elements in the list are in the list of lists and get the indexes?

E.g.

lists=[["apple","car"],["street","beer"],["plate"]]
Test=["apple","plate"]
# should return [(apple,0),(plate,2)] apple is inside first list and plate inside 3rd list
Test2=["car","street"]
# should return [(car,0),(street,1)]
Test3=["pineapple"]
# should return [] because pineapple isn't inside lists

i have difficulties to implement a solution because i have never worked with list of lists. Can someone help me or at least guide me?

Laz22434
  • 373
  • 1
  • 12
  • What should be the output if there is match for more than 2 lists. e.g. `lists=[["apple","car"],["street","beer"],["apple", "plate"]]` and `test=["apple"]`? – Jay Dec 17 '22 at 13:56
  • 1
    it should be all occurences, so it should be [(apple,0),(apple,2)] – Laz22434 Dec 17 '22 at 14:01
  • Why this question have been closed? Provided links is about finding the index in the list of lists, not about the most efficient (performant) way possible as I can see. – Abraham Tugalov Dec 17 '22 at 14:56

4 Answers4

1

Though you could simply iterate through your list of lists using brute force repeatedly for each word in each test list, it may be more efficient to first build a dictionary mapping each leaf item within the list of lists to the index (or, more generally, indexes) of the lists where the leaf is found, and then use this dictionary for all words in all tests.

More concretely, we can:

  • use defaultdict to build a dictionary of lists of indexes where a given leaf (e.g., 'apple') is found
  • iterate through the words in a test list to see which indexes (zero or more) within the list of lists contain each word.

Here's the code:

from collections import defaultdict
lists=[["apple","car"],["street","beer"],["apple","plate"]]

dct = defaultdict(list)
for i, L in enumerate(lists):
    for item in L:
        dct[item] += [i]
def foo(test):
    return [(item, i) for item in test if item in dct for i in dct[item]]

Test=["apple","plate"]
print( foo(Test) )

Test2=["car","street"]
print( foo(Test2) )

Test3=["pineapple"]
print( foo(Test3) )

Output:

[('apple', 0), ('apple', 2), ('plate', 2)]
[('car', 0), ('street', 1)]
[]
constantstranger
  • 9,176
  • 2
  • 5
  • 19
0

Here's one way you can solve this problem:

def find_indexes(lists, test_list):
  result = []
  for i, sublist in enumerate(lists):
    for element in test_list:
      if element in sublist:
        result.append((element, i))
  return result

This function iterates over the lists and for each sublist, it checks if any element in test_list is present in the sublist. If it is, it adds a tuple with the element and the index of the sublist to the result list.

You can then call this function as follows:

lists=[["apple","car"],["street","beer"],["plate"]]
test = ["apple","plate"]
result = find_indexes(lists, test)
print(result) # should print [(apple,0),(plate,2)]

test2 = ["car","street"]
result2 = find_indexes(lists, test2)
print(result2) # should print [(car,0),(street,1)]

test3 = ["pineapple"]
result3 = find_indexes(lists, test3)
print(result3) # should print []

I hope this helps!

0

Here's a one-liner using list comprehension -

lists = [["apple","car"],["street","beer"],["plate"]]
test = ["apple","plate"]
output = [(item, i) for i, lst in enumerate(lists) for item in test if item in lst]
print(output)

Output:

[('apple', 0), ('plate', 2)]

Here, we are iterating through all items in our test, for each item, go through the whole list, if the item is there at a particular index, add its value and index(item, i) to our output.


For all your tests -

lists = [["apple","car"],["street","beer"],["plate"]]
tests = [["apple","plate"], ["car","street"], ["pineapple"]]
for test in tests:
    output = [(item, i) for i, lst in enumerate(lists) for item in test if item in lst]
    print(f"test: {test}, output: {output}")

Output:

test: ['apple', 'plate'], output: [('apple', 0), ('plate', 2)]
test: ['car', 'street'], output: [('car', 0), ('street', 1)]
test: ['pineapple'], output: []
Jay
  • 2,431
  • 1
  • 10
  • 21
0

AFAIK you are asking about the most efficient way.
I guess your top priority about efficiency it's obviously performance.
So, let's try to benchmark some of the solutions.

First and most easy is just by using for-loop.

import time

data = (("apple", "car"), ("street", "beer"), ("plate"))
search = ("apple","plate")

# benchmark start
start = ((time.time_ns() / 1000000) / 1000)%60

# solution
def test_solution():
  out = []
  for s in search:
    for dk, dv in enumerate(data):
      if s in dv:
        out.append((s, dk))

  return out

for x in range(0, 1000000):
  test_solution()
  
# benchmark end
end = ((time.time_ns() / 1000000) / 1000)%60

# print result & benchmark result
print(test_solution())
print("time taken:", f"{end-start:.5f}ms")

It will take ~2.5ms to process searching the indexes in the given data 1000000 times.
Of course, all the code is benchmarked at the same testing hardware.

Next.
Solutions provided by jay and Abdulrahman-02 takes about the same amount of time (~2.5ms).
And solution provided by constantstranger works about 60% more faster (~1.5ms).

But we can make it even more faster by joining all solutions above in one.

# solution
_conv = {}
for dk, dv in enumerate(data):
  for v in dv:
    if v in _conv:
      _conv[v].append((v, dk))
    else:
      _conv[v] = [(v, dk)]

def test_solution():
  out = []

  for s in search:
    if s in _conv:
      out += _conv[s]

  return out

Now, this code will proceed 150% more faster than the original one.
It will take only ~1ms to run 1000000 times.

Guess we can call it efficient =)

Abraham Tugalov
  • 1,902
  • 18
  • 25