Motion detection on screenshots

Question

I'd like to know if there is a fast algorithm that can detect parts that moved between two sequential screenshots. The algorithm should take two images and output a set of (rectangular) regions in one image and a vector that describes where the matching region is located in the other image.

I'd like to use that for a losless video compression algorithm that is streamlined for screen capturing. I think this makes the use case a little bit different from the usual applications of motion detection:

The images are screenshots. It's unlikely that there are any artifacts or image noise
If a part of the image moves, it moves pixel-wise. Moved parts usually differ in less than 2% of their pixels
Regions that move are often big and have a rectangular shape

Since the video compression pipeline also has other steps and should happen in real-time, the motion detection should be fast.

Is there anything helpful?

score 2 · Answer 1 · answered Aug 23 '12 at 04:16

I've got a few thoughts, and a possibly-workable solution you can consider.

First, consider tracking the individual pixel deltas and transmitting/storing just those. A typical interactive session usually involves very small parts of the UI changing; moving or resizing windows tends to be less common (anecdotally) for long computer use sessions. This efficiently captures simple things like entered text, cursor movements and small UI updates without a lot of extra work.

You could also consider trying to hook the OS at a lower level, to get e.g. a display list of pixels, or even (optimally) a list of 'damage' rectangles. Mac OS X's Quartz compositor can give you this information, for example. This can help you quickly narrow down what to update, and in the ideal case, may give you an efficient representation of the screen in and of itself.

If you can query the OS's (window manager's) information about windows, you can store separate streams of data (pixel deltas) for every visible window, and then apply a simple display-list approach to 'render' them during playback. Then, it is trivial to identify moving windows since you can simply diff the display lists.

If you can query the OS's information about the cursor position, you can use the cursor movement to quickly estimate movement deltas, since cursor moves usually correlate well with object movement on screen (e.g. moving windows, icons, dragging objects, etc.). This allows you to avoid processing the image to determine movement deltas.

On to a possible solution (or a last resort in case you still can't identify the movement delta with the above): we can actually deal with the (very common) case of a single moving rectangle reasonably easily. Make a mask of all the pixels that change in the frame. Identify the largest connected component in the mask. If it approximates a rectangle, then you can assume it represents a moved region. Either the window moves exactly orthogonal (e.g. entirely in the x- or y- direction), in which case the total delta looks like a slightly bigger rectangle, or the window moves diagonally, in which case the total delta will have an 8-sided shape. Either way, you can estimate the motion vector, and verify this by diffing the regions. Note that this treatment deliberately ignores details that you will have to consider, e.g. pixels moving independently near the windows, or regions which don't appear to change (such as large blocks of solid colour in the window). A practical implementation would have to deal with all of the above.

Finally, I'd look into existing literature on real-time motion estimation. A lot of work has been done in optimizing motion estimation and compensation for e.g. video encoding, so you may be able to use that work as well if you find the methods above inadequate.

Estimating motion using the mouse cursor is a good idea but it fails in most cases: Consider scrolling - motion is inverse and not equivalent to the mouse curser (which might move up and down as well). Also consider UIs that scroll without using the mouse, such as 2D-games. Additionally, it is very difficult to do all that stuff in a portable way. Thanks for the feedback anyway! — fuz, Aug 23 '12 at 20:51

score 2 · Answer 2 · edited Aug 24 '12 at 19:53

2

Opencv covers image manipulation indepth and has a great number of tutorials on the subject.

http://docs.opencv.org/doc/tutorials/tutorials.html

PDF has more tuts than the website. Google for... opencv tutorials pdf ...top link.

The main website. http://opencv.willowgarage.com/wiki/

Essentially there are mathematical functions that you can run on images that work this out for you. Convolutions and such like.

edited Aug 24 '12 at 19:53

0xced

25,219
10
103
255

answered Aug 24 '12 at 09:17

Emile

11,451
5
50
63

Oh, and watch out, there is opencv 1 & 2 and not all tutorials you'll find have been converted to 2.0 syntax. Essentially if you see iplimage() rather than Mat() then your using 1.0 and you should look for one that uses Mat() (opencv 2.0) – Emile Aug 24 '12 at 09:23
And this is possible for a frame rate of 25 fps on a stock machine? – fuz Aug 24 '12 at 09:46

score 2 · Answer 3 · answered Aug 24 '12 at 17:27

A common way of tracking motion across frames is: 1. Decide on points you want to track in image 1 2. Correlate points in the input image with points in the output image 3. Determine the transform that got them there

For step 1, there are a lot of "trackers" out there, some fairly standard that you'll find in OpenCV that look for "interesting" points (intersections of edges, local maxima, etc). The Kanade-Tomasi image tracker is one of these. However, for your use you might prefer to just create a regular grid of points.

For step 2, a common technique is to use a quadtree of reduced resolution... by this I mean to take your image, create a new image with 1/2 the width and height, and again, and again. Now you've got a very low-resolution image at the top you can search orders of magnitude faster and will give you a bounding box to look in the next higher rez image. In your case, optimizations might be to first look at the output to see if it's changed at all, and when you find a match for point x, y to also look next to it for point x+1 or y+1, etc.

For step 3, that's up to you... if you're talking about windows sliding around the screen there's going to be large patches that move together but are otherwise identical outside and inside the edges. If there's any animation, though, that could throw things off. And the mouse cursor itself is a small thing that's going to move around and reduce the algorithm effectiveness.

That stuff looks pretty interesting! This is the answer I wanted to get. Thank you. — fuz, Aug 24 '12 at 19:39

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

1

What I did

I implemented a simple technique that compensates for most movements and is simple to implement.

Each frame is divided into tiles of a constant size, say, 8×8 pixels. The codec manages a ringbuffer of a customizable number of tiles, for instance 2²⁰. Now, for each tile the codec encounters in the input stream, it checks whether it is already to be found in the ringbuffer. If it is, it just saves the tiles index, if not it stores the tile in the ringbuffer.

Whenever a part of the image is moved by a multiple of the blocksize from any image in the past, it is very likely that one finds the tiles in the cache. This way, one can compensate most motion. Since finding a tile in the ringbuffer is very fast, this is fast enough to run in real time.

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 24 '12 at 15:55

fuz

88,405
25
200
352

Im trying to solve the same problem. Is this the solution you finally implemented or did you improve on it further? Ideally, I wold have preferred to get a rectangular region which moved. Did you figure out a way to do that? – tunafish24 Feb 09 '19 at 06:26
@tunafish24 This is the solution we finally implemente though its effect is rather mediocre. There are better approaches for motion detection; check out what modern video codecs like H.264 do. – fuz Feb 09 '19 at 07:08
will do. What technique did you guys use for screen capture on Windows 7 with Aero? I've tried popular ones described here i.e. GDI (mouse flickers + max 30 fps), DirectX (slower than GDI). There are some closed source softwares that are able to hit 60fps with minor cpu usage and with no driver installation. Would love to figure out how they are doing it. – tunafish24 Feb 09 '19 at 10:16
We developed this software for UNIX and captured the screen through the appropriate X11interface. I don't care about Windows. – fuz Feb 09 '19 at 13:17

Motion detection on screenshots

4 Answers4

What I did