To automate software based on image recognition I started to learn Python using OpenCV's template matching and PyAutoGUI (AutoHotkey or AutoIt might work but I wanted Python). I want to match images to the screen and click when they are found. The process is stateless hence all templates should be matched. Requirements:
- Work with any application (not only Windows WPF).
- Multi-monitor capability (PyAutoGUI has trouble with that). Screenshots are 3840x1080px with 2 monitors.
- Find UI-elements in less than 0.5 sec.
- Optional: Non blocking mouse interaction.
My implementation in Python (20 images) took around 20 seconds. Using joblib library cut time by 75% (depending on available cores) which is still too slow. Also it doesn't seem scalable.
- What is the best scalable approach?
- Isn't this similar to autonomous driving use case? How does its software handle it?
- Are Python, PyAutoGUI, OpenCV suitable?
I could put a CNN (convolutional neural network) as a top layer and only loop when the classifier finds a template. Does that solve the problem? It still needs to loop through all images.