List of strings I want to group together if they contain a specific substring from a master list

Question

I have list of strings that I want to group together if they contain a specific substring from a master list.

Example Input:

["Audi_G71Q3E5T7_Coolant", "Volt_Battery_G9A2B4C6D8E", "Speaker_BMW_G71Q3E5T7", "Engine_Benz_G9A2B4C6D8E", "Ford_G9A2B4C6D8E_Wheel", "Toyota_Exhaust_G71Q3E5T7"]

Master List:

["G71Q3E5T7", "G9A2B4C6D8E"]

Expected Output:

[["Audi_G71Q3E5T7_Coolant", "Speaker_BMW_G71Q3E5T7", "Toyota_Exhaust_G71Q3E5T7"], ["Volt_Battery_G9A2B4C6D8E", "Engine_Benz_G9A2B4C6D8E", "Ford_G9A2B4C6D8E_Wheel"]]

I haven't found any example or solution online but am aware the itertools.groupby() function is useful in these scenarios, but am struggling to make it work.

_"am aware ... struggling to make it work"_: What have you tried? What _specifically_ are you struggling with? Please take the [tour], read [what's on-topic here](/help/on-topic), [ask], and the [question checklist](//meta.stackoverflow.com/q/260648/843953), and provide a [mre]. Welcome to Stack Overflow! — Pranav Hosangadi, Aug 12 '22 at 15:30
I know the itertools.groupby() function exists but do not know how to use it to produce my expected output, that is my struggle. I have found nothing online to aid me in finding the solution to my problem I'm having. — Gilles, Aug 12 '22 at 15:37

score 0 · Answer 1 · answered Aug 12 '22 at 15:37

Example inputs:

in_list = ["Audi_G71Q3E5T7_Coolant", "Volt_Battery_G9A2B4C6D8E", "Speaker_BMW_G71Q3E5T7",
           "Engine_Benz_G9A2B4C6D8E", "Ford_G9A2B4C6D8E_Wheel", "Toyota_Exhaust_G71Q3E5T7"]
master_list = ["G71Q3E5T7", "G9A2B4C6D8E"]
out_list = []

For every master item, check if is in any input item. Add results to out_list[index]

for index, m in enumerate(master_list):
    out_list.append([])
    for i in in_list:
        if m in i:
            out_list[index].append(i)

print(out_list)

having nested lists with a `i in ` is unfortunately inefficient for large lists (quadratic complexity) — mozway, Aug 12 '22 at 15:42

mozway · Answer 2 · 2022-08-12T15:43:36.877

itertools.groupby is not appropriate as you would need to sort the data.

You can use a regex to extract the IDs, and a dictionary/defaultdict to collect the data:

L = ["Audi_G71Q3E5T7_Coolant", "Volt_Battery_G9A2B4C6D8E",
     "Speaker_BMW_G71Q3E5T7", "Engine_Benz_G9A2B4C6D8E",
     "Ford_G9A2B4C6D8E_Wheel", "Toyota_Exhaust_G71Q3E5T7"]
ids = ["G71Q3E5T7", "G9A2B4C6D8E"]

import re
from collections import defaultdict

regex = re.compile('|'.join(map(re.escape, ids)))
# re.compile(r'G71Q3E5T7|G9A2B4C6D8E', re.UNICODE)

out = defaultdict(list)

for item in L:                 # for each item
    m = regex.search(item)     # try to find ID
    if m:                      # if ID
        out[m.group()].append(item)  # add to appropriate list
        
out = list(out.values())

output:

[['Audi_G71Q3E5T7_Coolant',
  'Speaker_BMW_G71Q3E5T7',
  'Toyota_Exhaust_G71Q3E5T7'],
 ['Volt_Battery_G9A2B4C6D8E',
  'Engine_Benz_G9A2B4C6D8E',
  'Ford_G9A2B4C6D8E_Wheel']]

List of strings I want to group together if they contain a specific substring from a master list

2 Answers2