0

Let's say I have two lists of files with similar names like so:

images = ['image_im_1', 'image_im_2']
masks = ['mask_im_1', 'mask_im_2', 'mask_im_3']

How would I be able to efficiently remove elements that aren't matching? I want to get the following:

images = ['image_im_1', 'image_im_2']
masks = ['mask_im_1', 'mask_im_2']

I've tried doing the following:

setA = set([x[-4:] for x in images])
setB = set([x[-4:] for x in masks])

matches = setA.union(setB)

elems = list(matches)

for elem in elems:
    result = [x for x in images if x.endswith(elem)]

But this is rather naïve and slow as I need to iterate through a list of ~100k elements. Any idea how I can effectively implement this?

cottontail
  • 10,268
  • 18
  • 50
  • 51
ChilliMayoo
  • 69
  • 1
  • 6
  • This looks like a good way to do it (you probably meant intersection instead of union, and you don't need `elems = list(matches)`, you can iterate on the set directly) – Thierry Lathuille Jun 01 '22 at 16:24

2 Answers2

1

First of all, since you want the common endings, you should use intersection, not union:

matches = setA.intersection(setB)

Then matches is already a set, so instead of converting it to a list and loop over it, loop over images and masks and check for set membership.

imgres = [x for x in images if x[-4:] in matches]
mskres = [x for x in masks if x[-4:] in matches]
cottontail
  • 10,268
  • 18
  • 50
  • 51
1

Your solution is basically as good as it gets, you can improve it to just a single run through though if you store an intermediate map image_map

# store dict of mapping to original name
image_map = {x[-4:]: x for x in images}

# store all our matches here
matches = []

# loop through your other file names
for mask in masks:

    # if this then we have a match!
    if mask[-4:] in image_map:

        # save the mask
        matches.append(mask)

        # get the original image name
        matches.append(image_map[mask[-4:]])

Matt
  • 1,196
  • 1
  • 9
  • 22