Python: Efficient way of matching slices of strings between two lists

Question

Let's say I have two lists of files with similar names like so:

images = ['image_im_1', 'image_im_2']
masks = ['mask_im_1', 'mask_im_2', 'mask_im_3']

How would I be able to efficiently remove elements that aren't matching? I want to get the following:

images = ['image_im_1', 'image_im_2']
masks = ['mask_im_1', 'mask_im_2']

I've tried doing the following:

setA = set([x[-4:] for x in images])
setB = set([x[-4:] for x in masks])

matches = setA.union(setB)

elems = list(matches)

for elem in elems:
    result = [x for x in images if x.endswith(elem)]

But this is rather naïve and slow as I need to iterate through a list of ~100k elements. Any idea how I can effectively implement this?

This looks like a good way to do it (you probably meant intersection instead of union, and you don't need `elems = list(matches)`, you can iterate on the set directly) — Thierry Lathuille, Jun 01 '22 at 16:24

score 1 · Answer 1 · answered Jun 01 '22 at 16:28

First of all, since you want the common endings, you should use intersection, not union:

matches = setA.intersection(setB)

Then matches is already a set, so instead of converting it to a list and loop over it, loop over images and masks and check for set membership.

imgres = [x for x in images if x[-4:] in matches]
mskres = [x for x in masks if x[-4:] in matches]

score 1 · Answer 2 · answered Jun 01 '22 at 16:28

Your solution is basically as good as it gets, you can improve it to just a single run through though if you store an intermediate map image_map

# store dict of mapping to original name
image_map = {x[-4:]: x for x in images}

# store all our matches here
matches = []

# loop through your other file names
for mask in masks:

    # if this then we have a match!
    if mask[-4:] in image_map:

        # save the mask
        matches.append(mask)

        # get the original image name
        matches.append(image_map[mask[-4:]])

Python: Efficient way of matching slices of strings between two lists

2 Answers2