How to read out a certain display screen area using Python/Tkinter?

Question

Suppose that you are getting a live feed from a camera to your computer using a proprietary program, which is displaying e.g. a 1024x1024 pixel video to your screen. Would it be possible to write a small GUI program using Python/Tkinter, featuring a resizable window with a transparent area which "grabs" the background of whatever the transparent area is placed on top?

Assuming that you don't have direct access to the camera, I suppose this questions comes down to 1.) how one can read the video data provided by your GPU to a certain screen area using Python and 2.) if it is possible to dynamically adjust the readout area via a transparent Tkinter window.

I can probably figure out 2.), once it is clear how to grab a certain screen area from the GPU.

Note that I do not want to take a screenshot every 50 ms or so, but really get the GPU stream for a specific display area, at whatever system refresh rate is set. The stream should then be stored into a circular RAM buffer for live image analysis.

Stackoverflow isn't really the venue for this sort of question. If you need help debugging a *specific* issue with your code, you're in the right place! Otherwise, you may want to look for help elsewhere. Hope that helps. — JRiggles, Jun 01 '23 at 16:48
@JRiggles Question was updated just when you were reponding...please have a look. It does not get any *more* specific than this. Otherwise, I would already have the answer and no need to ask. — srhslvmn, Jun 01 '23 at 16:50
@JRiggles Also, if you think that this question belongs somewhere else, please provide some directions on where to post instead! I'm glad for any suggestions. Thx — srhslvmn, Jun 01 '23 at 17:03
When I say "specific" I mean along the lines of "I am experiencing this error with my code, please help me find a solution" rather than "I would like a way to implement this idea, please provide suggestions". You may find the [Software Engineering](https://softwareengineering.stackexchange.com/) stack exchange, or perhaps [r/Python](https://reddit.com/r/python) to be a better fit. The issue isn't that your question lacks detail so much as it just doesn't belong on SO - which is not to say that it's a *bad* question, by any means! — JRiggles, Jun 01 '23 at 17:13
@JRiggles Thanks for these information! I must admit that SO/SE board categorization is becoming a bit confusing. I mean, seriously: How am I suppose to decide whether this question fits best in [Stack Overflow](https://stackoverflow.com), [Super User](https://superuser.com) or [Software Engineering](https://softwareengineering.stackexchange.com)? :) — srhslvmn, Jun 01 '23 at 17:18
Yes, you can grab areas of the screen. Whatever operating system you're using has a "screen clip" utility that does it. No, you wouldn't use a tkinter transparent window. You would just read from the root desktop window. — Tim Roberts, Jun 01 '23 at 17:18
@srhslvmn They can't make it too easy! Then there'd be nothing to moderate haha. I tend to figure "stackoverflow for bugs" "superuser for IT" and "software engineering for everything else"...which may be painting with a broad brush, but what can you do — JRiggles, Jun 01 '23 at 17:22
@TimRoberts Thanks, this is new information to me. What would be this "screen clip" utility in Windows 7 and 10? And can I talk directly to it via Python? Using a transparent Tkinter window should be possible by reading the window position and size into variables and feeding them to whatever then reads from "screen clip"...right? — srhslvmn, Jun 01 '23 at 17:22
@JRiggles Haha, there you have it. In my experience, it is easier to post extremely broad/mixed questions in SU, while Software Engineering seems to be more specialized... And SO definitely has a strong focus on code, although one cannot always provide some (except maybe some pseudo code, which might or might not be of any usefulness for the question itself) — srhslvmn, Jun 01 '23 at 17:25
It's called the "Snipping Tool". On Win 10, press the Windows key + Shift + S. No, it's standalone. Be sure to note the answer from srhslvmn below, which despite the disclaimer does exactly what you want. A transparent window is the wrong answer for several reasons. One, because of window compositing, reading from your own window might actually read from an offscreen window image, and not get what's underneath. Two, because you don't NEED it. Anyone can get a handle to the desktop window and fetch pixels from it. — Tim Roberts, Jun 01 '23 at 17:29
@TimRoberts Okay, this tool doesn not seem to exist at least on Windows 7. Yes, it's possible to take screenshots, but I was hoping to access directly whatever comes from the GPU — srhslvmn, Jun 01 '23 at 17:33
Where do you think the GPU gets its info? It reads pixels from the desktop frame buffer. — Tim Roberts, Jun 01 '23 at 17:34
@TimRoberts Ah, wait...so there is a portion in RAM (I suppose?) where each new display frame is stored? And the GPU reads each frame from RAM and converts it into display signals? — srhslvmn, Jun 01 '23 at 17:40
@TimRoberts https://en.wikipedia.org/wiki/Framebuffer ...okay, I feel like getting closer to the actual answer now. So I basically have to find a way to access the frame buffer in RAM! — srhslvmn, Jun 01 '23 at 17:41
Right. The frame buffer is just memory. Sometimes it's part of RAM, sometimes it's on the video card, but to the CPU it's just memory. And `ImageGrab` can read that memory. — Tim Roberts, Jun 01 '23 at 18:58
@TimRoberts Wait, so `ImageGrab` actually reads directly the frame buffer? But that would then be ***exactly*** what I was looking for, no? If this would be the case, could you just briefly explain how the frame rate is determined? I.e. when the camera records at 30 Hz, but my display refreshes at 120 Hz...is each camera image in the frame buffer simply repeated 4 times by the GPU? And with what rate does `ImageGrab` then "grab" frames from the buffer, does it somehow get the information that the display is refreshing at 120 Hz and reads from the buffer at the same rate? — srhslvmn, Jun 01 '23 at 19:37
The application that is reading from the camera stores each new image into the frame buffer, which is nothing more than a chunk of memory. It does that copy every time a frame arrives, which in your case is 30 times a second. The graphics chip reads entire the frame buffer 120 times a second and sends it out the wire. So, it's going to see the same bytes in that window 4 frames in a row. `ImageGrab` just copies blocks from the frame buffer memory. There's no synchronization and no timing. — Tim Roberts, Jun 01 '23 at 20:25
@TimRoberts Okay, so I guess there is no way to tell when the old frame inside the buffer has been updated? Maybe something like a software trigger? OR: To consider the other way around, is there some way to trigger on the display refresh? — srhslvmn, Jun 02 '23 at 00:02
Sorry if my questions seem a little dumb, but I obviously lack detailed knowledge of how the whole video signal chain works... But I can imagine that there must be some kind of a trigger event that can be used to synchronize some software process to the display refreshing. Can't think of a concrete example right now, but this seems rather reasonable (maybe to avoid strobing/struttering of certain video-related applications, possibly some AR/VR stuff where physical devices need to be sync'ed to video) — srhslvmn, Jun 02 '23 at 00:05
If you need the camera image, then you need to be talking to the camera, not scraping the screen. That's just bad engineering. Both OpenGL and Direct3D have methods for syncing with the refresh rate, but that doesn't help your case, because the camera is not synced to the display refresh. — Tim Roberts, Jun 02 '23 at 00:20
@TimRoberts I actually *do* want to scrape the screen (cool expression), because the advantage of the app I'm writing is precisely **not** to have to talk to the camera. Reason: If you have 10 different cameras with 10 different SDKs, but don't need the raw data for analysis (displayed images suffice), you save a ton of work and the program is super flexible. You can basically drag and drop it right on top of whatever proprietary program displays your image data and "scrape the screen". In fact, you might not even need access to the machine where that proprietary program is running, simply... — srhslvmn, Jun 02 '23 at 00:25
...film whatever is on the screen and use that. :) Although that is not the real goal here. Just making live easier and get a platform- and SDK/API-independent image analysis tool — srhslvmn, Jun 02 '23 at 00:28
Virtually every camera on a given platform uses the same API. There will not be 10 different SDKs. Further, those 10 cameras will not be synchronized. They're all writing at different times. Further, the frame-buffer-to-CPU reading path is not highly optimized, because the important operations are the ones going INTO the frame buffer. There may not be TIME for you to read 10 cameras. You're designing the wrong app here. — Tim Roberts, Jun 02 '23 at 00:37
@TimRoberts Do you, by any chance, know which API that is? Btw, I'm talking about scientific cameras (not consumer cameras, although lines can blur). But the thing with 10 cameras is a misunderstanding: I meant to say that by grabbing the display output, you don't need to talk to 10 different proprietary APIs, but simply read whatever is coming into the frame buffer...so not reading out 10 cameras simultaneously, but avoiding the necessity to integrate 10 different libraries. — srhslvmn, Jun 02 '23 at 23:54
On Windows, most are either DirectShow or MediaFoundation. You are correct that industrial/scientific cameras sometimes have custom APIs. I suppose I should stop arguing with you. ;) What you're asking can be done with `ImageGrab`. Whether you'll be satisfied, I can't say.... — Tim Roberts, Jun 03 '23 at 01:12
@TimRoberts I think we agree on all points. I'll give `ImageGrab` a shot — srhslvmn, Jun 04 '23 at 04:28

srhslvmn · Answer 1 · 2023-06-01T17:26:56.723

1

While this does not seem to answer the exact question, pillow's ImageGrabclass provides a screen capturing functionality with the option to define a region of interest:

https://www.simplifiedpython.net/python-screenshot/

However, it basically takes screenshots on command rather than tapping into the actual video stream of the GPU.

edited Jun 01 '23 at 17:26

answered Jun 01 '23 at 17:14

srhslvmn

133
1
9

Don't shortchange your answer. You've given good info here. And I'm not sure what you think "the actual video stream of the GPU is", but rest assured that the GPU reads the pixels from the desktop, exactly like `ImageGrab`. There is no magical "pixel stream" here. – Tim Roberts Jun 01 '23 at 17:33
@TimRoberts Hi, what does "shortchange" mean? Sorry, I'm not a native English speaker... I will give `ImageGrab` a shot if you think this does the job. With video stream from the GPU I mean whatever the monitor receives from the GPU. If your monitor runs at 30 Hz, then I'd expect a video stream at 30 Hz, while a for a faster one with 120 Hz, a video stream at 120 Hz etc. – srhslvmn Jun 01 '23 at 17:36
@TimRoberts But what do you mean by *"the GPU reads the pixels from the desktop"*? Isn't it the GPU which *sends* the pixel information? – srhslvmn Jun 01 '23 at 17:38
The GPU COPIES pixels from the frame buffer out to the HDMI wire or to the LCD panel, or wherever they need to go. It doesn't INVENT the pixels. It's not creating a new pixel stream from nothing. – Tim Roberts Jun 01 '23 at 18:57
@TimRoberts Ahh...got it – srhslvmn Jun 01 '23 at 19:31

How to read out a certain display screen area using Python/Tkinter?

1 Answers1