I ended up rolling my own cursor support, since it appeared that the kernel support was dependent on whatever the particular video driver supported. The performance ended up great for my purposes. Here's what I did:
- Open the /dev/fb0 framebuffer, adjust vinfo as needed,
mmap
the framebuffer, and malloc
two buffers the same size as the framebuffer. One of the buffers is my back buffer where all the drawing happens. The other one is my "cursor" buffer where the cursor is drawn.
- Open the appropriate /dev/input/eventX in preparation for reading mouse events.
- Define a "refresh" function to call whenever something is drawn into the back buffer, or whenever there is mouse activity.
poll
for mouse events with a reasonable timeout. I used a 500 ms timeout and put this inside a pthread
so that it had very little performance overhead.
- The "refresh" function
memcpy
'ies the back buffer into the cursor buffer, and draws the cursor on top of it. (I erase the mask bits under the cursor and draw the cursor bits, as per the images here.) The cursor buffer is then memcpy
'ed into the framebuffer.
- (I protect the refresh functionality with two mutex locks for better performance. I acquire the first before copying the back buffer to the cursor buffer and release it after drawing the cursor. I acquire the second before drawing the cursor and release it after copying the cursor buffer to the framebuffer. This improves performance noticeably when doing lots of really fast drawing.)
A few reasons for some of my decisions:
- Writing into the framebuffer is reasonably fast, but reading from it is much slower, hence the use of regular
malloc
'ed memory for the back and cursor buffers.
memcpy
is much faster than anything I can write and is thread-safe.
- Concurrent access to the framebuffer is slow, presumably because
memcpy
locks regions and blocks when trying to access a region currently in use. This is why I used two mutexes to protect copies from the back buffer to the cursor buffer, and from the cursor buffer to the framebuffer.
poll
with a 0
timeout is equivalent to a tight loop that uses a lot of CPU cycles, hence the use of a non-zero timeout. But poll
returns as soon as there is activity on the input, so the responsiveness is great.
On my hardware, I didn't find a usable way to synchronize with the vertical blanking (some of the ioctl
's are apparently no-ops), but the approach described above exhibited no particular tearing. Yes, this approach uses two offscreen buffers, each of which require 4 MB on my 1920 x 1080 16-bit/pixel screen, but it's very simple and sufficient for my needs.