Performant UI automation based on image recognition

Question

To automate software based on image recognition I started to learn Python using OpenCV's template matching and PyAutoGUI (AutoHotkey or AutoIt might work but I wanted Python). I want to match images to the screen and click when they are found. The process is stateless hence all templates should be matched. Requirements:

Work with any application (not only Windows WPF).
Multi-monitor capability (PyAutoGUI has trouble with that). Screenshots are 3840x1080px with 2 monitors.
Find UI-elements in less than 0.5 sec.
Optional: Non blocking mouse interaction.

My implementation in Python (20 images) took around 20 seconds. Using joblib library cut time by 75% (depending on available cores) which is still too slow. Also it doesn't seem scalable.

What is the best scalable approach?
Isn't this similar to autonomous driving use case? How does its software handle it?
Are Python, PyAutoGUI, OpenCV suitable?

I could put a CNN (convolutional neural network) as a top layer and only loop when the classifier finds a template. Does that solve the problem? It still needs to loop through all images.

The learning plays a big role in my decision, yes. Unless some1 tells me it's impossible to achieve it with python of course. As I mentioned: I tried out Sikuli and found it rather weird to work with (semi graphical programming). Also I'm not sure if it could do the multi template matching that I need. It's a sequential approach, but I want it to work "stateless" — dtrinh, Jan 30 '19 at 13:59
If image recognition is not the only approach, there is Python bindings for AutoIt. Also these is more powerful text based library pywinauto (not the same as pyautogui) which supports WPF, WinForms, Qt etc. Also it can reduce screen area for image search in hard cases. Some custom control may take much less space on the screen than even main window of the app. So why not to combine text and image approaches? — Vasily Ryabov, Feb 05 '19 at 21:10
@VasilyRyabov As a matter of fact, I did import pyautowin to get hold of a window and its position. Then I used that to limit the image search to only the window area which drastically improved performance. Now it's still not highly scalable, but very well fast enough with 20 images at most. — dtrinh, Feb 06 '19 at 15:20

Performant UI automation based on image recognition

0 Answers0