1

I have a directory of images and an image that I know is in this image directory there is a similar image in the directory saved in a different format and scaled differently, but I dont know where (about 100 000 images). I want to look for the image and find out its filename inside this directory.

I am looking for a mostly already made soulution which I couldn't find. I found OpenCV but I would need to write code around that. Is there a project like that out there?

If there isn't could you help me make a simple C# console app using OpenCV, I tried their templates but never managed to get SURF or CudaSURF working.

Thanks

Edited as per @Mark Setchell's comment

Arimodu
  • 13
  • 3
  • 1
    please review [ask] and [help/on-topic] -- https://en.wikipedia.org/wiki/CBIR – Christoph Rackwitz Aug 12 '22 at 12:13
  • Look at [template matching](https://docs.opencv.org/4.x/d4/dc6/tutorial_py_template_matching.html). Without example of images can't suggest something that will work in your case. – Gralex Aug 12 '22 at 20:06
  • @Gralex template matching against that many images? I'm sure that will take forever. since OP already mentioned local feature descriptors, there is a chance that OP knows, but didn't say, that the query may not be a pixel-exact match to anything in the "database" – Christoph Rackwitz Aug 13 '22 at 19:26
  • @ChristophRackwitz , it take forever depends on the real task and implementation of algo. I previously use `absDiff` and `resize` to find most closest image in folder. Works perfectly. You don' know real taks, without it any suggestion can be bad. – Gralex Aug 14 '22 at 08:18

2 Answers2

1

If the image is identical, the fastest way is to get the file size of the image you are looking for and compare it with the file sizes of the images amongst which you are searching.

I suggest this first because, as Christoph clarifies in the comments, it doesn't require reading the file at all - it is just metadata.

If that yields more than one matching answer, calculate a hash (MD5 or other) and pick the filename that produces the same hash.

Again, as mentioned by Christoph in the comments, this doesn't require decoding the image, or holding the decompressed image in RAM, just checksumming it.

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • "size" meaning the *file* size, because that doesn't require actually reading the image... and the hash _only_ requires reading the image, not decoding it (decoding is costlier than anything else so far) – Christoph Rackwitz Aug 13 '22 at 19:28
  • @ChristophRackwitz Yes, I could have been clearer about *why* I suggested what I did, and why in that order. I have tried to improve my answer along the lines of your suggestions. – Mark Setchell Aug 13 '22 at 19:38
  • just making sure because the question is so simple that there's no warning about the rabbit hole the question opens up :) – Christoph Rackwitz Aug 13 '22 at 19:42
  • Unfortunately the image I have is scaled + its a jpg and the original is webp or png. – Arimodu Aug 14 '22 at 20:47
  • @Arimodu In that case, please click `edit` under your original question and remove the part that says *"the image is in this directory"* and replace it with something along the lines of *"there is a similar image in the directory saved in a different format and scaled differently"*. Maybe also add if it is rotated. Or part of another image. Or potentially incomplete. Thank you. – Mark Setchell Aug 14 '22 at 21:07
0

So in the end I used this site and modified the python code used there for searching a directory instead of a single image. There is not much code so the full thing is below:

import argparse
from ast import For, arg
import cv2
from os import listdir
from os.path import isfile, join

ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", type=str, required=True,
    help="path to input image where we'll apply template matching")
ap.add_argument("-t", "--template", type=str, required=True,
    help="path to template image")
args = vars(ap.parse_args())

# load the input image and template image from disk
print("[INFO] loading template...")
template = cv2.imread(args["template"])
cv2.namedWindow("Output")
cv2.startWindowThread()

# Display an image
cv2.imshow("Output", template)
cv2.waitKey(0)
# convert both the image and template to grayscale
templateGray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)

imageFileNames = [f for f in listdir(args["image"]) if isfile(join(args["image"], f))]

for imageFileName in imageFileNames:
    try:
        imagePath = args["image"] + imageFileName
        print("[INFO] Loading " + imagePath + " from disk...")
        image = cv2.imread(imagePath)
        print("[INFO] Converting " + imageFileName + " to grayscale...")
        imageGray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        print("[INFO] Performing template matching for " + imageFileName + "...")
        result = cv2.matchTemplate(imageGray, templateGray,
            cv2.TM_CCOEFF_NORMED)
        (minVal, maxVal, minLoc, maxLoc) = cv2.minMaxLoc(result)
        (startX, startY) = maxLoc
        endX = startX + template.shape[1]
        endY = startY + template.shape[0]
        if maxVal > 0.75:
            print("maxVal = " + str(maxVal))
            # draw the bounding box on the image
            cv2.rectangle(image, (startX, startY), (endX, endY), (255, 0, 0), 3)
            # show the output image
            cv2.imshow("Output", image)
            cv2.waitKey(0)
            cv2.imshow("Output", template)
    except KeyboardInterrupt:
        break
    except:
        print(imageFileName)
        print("Error")

cv2.destroyAllWindows()

The code above shows any image with match value (what I guess is how much similarity there is between source and template) greater than 0.75 Probably still too low but if you want to use it tweak it to your liking. Note that this WILL NOT work if the image is rotated and if, like me, you have a bright light source in the template other lightsources will come up as false positives

As for time it took me about 7 hours, where the script paused about every 20 minutes for a false positive until I found my image. I got through about 2/3 of all images.

as a sidenote it took 10 minutes to just build the array of files inside the directory, and it took about 500mb of ram once done

This is not the best answer so if anyone more qualified finds this feel free to write another answer.

Arimodu
  • 13
  • 3