4

I need to implement off-screen rendering to texture on an ARM device with PowerVR SGX hardware.

Everything is done (pixelbuffers and OpenGL ES 2.0 API were used). The only problem unsolved is very slow glReadPixels function.

I'm not an expert in OpenGL ES, so I'm asking community: is it possible to render textures directly into user-space memory? Or may be there is some way to get hardware address of texture's memory region? Some other technique (EGL extensions)?

I don't need an universal solution, just working one for PowerVR hardware.

Update: A little more information on 'slow function glReadPixels'. Copy 512x512 RGB texture data to CPU's memory:

  • glReadPixels(0, 0, WIDTH, HEIGHT, GL_RGBA, GL_UNSIGNED_BYTE, &arr) takes 210 ms,
  • glReadPixels(0, 0, WIDTH, HEIGHT, GL_BGRA, GL_UNSIGNED_BYTE, &arr) takes 24 ms (GL_BGRA is not standard for glReadPixels, it's PoverVR extension),
  • memcpy(&arr, &arr2, WIDTH * HEIGHT * 4) takes 5 ms

In case of bigger textures, differences are bigger too.

qehgt
  • 2,972
  • 1
  • 22
  • 36
  • 1
    May I ask what you need the texture in client-space memory for? Maybe you don't need the texture to ever leave GPU memory. Keep in mind, when you just want to use the texture for, well, texturing, you can render directly into GPU textures using [FBOs](http://www.songho.ca/opengl/gl_fbo.html), which would render your whole problem obsolete. – Christian Rau Feb 28 '12 at 17:50
  • No, I really need it. Rendered images should be sent over network as a result of device' work. – qehgt Feb 28 '12 at 18:19

3 Answers3

4

Solved.

The way how to force OpenVR hardware render into user-allocated memory: http://processors.wiki.ti.com/index.php/Render_to_Texture_with_OpenGL_ES#Pixmaps

An example, how to use it: https://gforge.ti.com/gf/project/gleslayer/

After all of this I can get rendered image as faster as 5 ms.

qehgt
  • 2,972
  • 1
  • 22
  • 36
  • qehgt I've been reading through this... is the main trick using CMem to allocate the user-space buffer and using EGLImageKHR? – Constantin Oct 02 '12 at 14:39
  • CMEM - yes, EGLImageKHR - not sure. You need to allocate a "native pixmap" in CMEM memory, after that you can read/write to this pixmap very quickly. Look into `common_create_native_pixmap` function in sgxsink_main.cpp file of gleslayer application for details. – qehgt Oct 02 '12 at 14:59
  • Yep that's exactly what I'm using as my example. A final quick question for you, the example doesn't really comment on "reading" but I'm assuming that we have access to the memory at the address cmem_alloc returned (lAddress, NOTE not pvAddress). From here do we simply memcpy to where we want? – Constantin Oct 02 '12 at 15:56
  • Yes, you can read/write via linear address (not physical address, of course) from your application. But there are lots of small details. For example, to increase speed of read/write operations you should mark CMEM's region as "cacheable", but after that you'll need to manually flush/invalidate the region. And so on. – qehgt Oct 02 '12 at 18:36
  • I'm very new to embedded systems usually focusing on algorithms, where could I read up on "so on" and marking physical memory as cacheable, preliminary google searches aren't guiding me. A little context, I'm doing a GPGPU application and need 30 fps (or close to it) at 1920x1080 with 2bpp, isn't that a little much to store in a cache? – Constantin Oct 02 '12 at 18:52
  • Hmmm... I think, it's better to find better place to talk. E-mail me. – qehgt Oct 02 '12 at 20:39
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/17467/discussion-between-constantin-and-qehgt) – Constantin Oct 02 '12 at 21:42
0

When you call opengl functions, you're queuing commands in a render queue. Those commands are executed by the GPU asynchronously. When you call glReadPixels, the cpu must wait the gpu to finish its rendering. So the call might be waiting for that draw to finish. On most hardware ( at least those I work on ), the memory is shared by the cpu and the gpu, so the read pixel should not be that slow if the rendering is done.

If you can wait the result or deferred it to the next frame, you might not see that delay anymore

crazyjul
  • 2,519
  • 19
  • 26
  • 1
    I called `glFinish` before `glReadPixels`, so I'm sure it's not issue. Also, in this scenario: render image, call `glFinish`, measure `glReadPixels` call, measure `glReadPixels` call, measure `glReadPixels` call, ... I got same values. Every call of `glReadPixels` is slow – qehgt Feb 27 '12 at 21:20
0

Frame buffer objects are what you are looking for. They are supported on OpenGL ES, and on PowerVr-SGX

EDIT: Keep in mind that GPU/CPU hardware is incredibly optimized towards moving data in one direction from CPU side to GPU side. The backpath from GPU to CPU is often much slower (its just not a priority to spend hardware resources on). So what ever technique you use (eg FBO/getTexImage) you're going to run against this limit.

Justicle
  • 14,761
  • 17
  • 70
  • 94
  • Ok, I can render into FBO. But how to get rendered image from FBO? The only way (as I see) is to call `glReadPixels`. Am i wrong? – qehgt Feb 27 '12 at 21:51
  • glReadPixels reads from display. You should use glGetTexImage to get data of texture attached as color attachment to fbo. – Mārtiņš Možeiko Feb 28 '12 at 06:46
  • @MārtiņšMožeiko Sadly enough OpenGL ES doesn't have glGetTexImage. – Christian Rau Feb 28 '12 at 17:47
  • @Justicle Since you yourself say that the GPU-to-CPU border cannot be crossed that easily, FBOs won't really by him anything against `glReadPixels`. – Christian Rau Feb 28 '12 at 18:23
  • The OP asked for "off-screen rendering to texture". Copying to CPU side is separate part of that. FBOs help with the first part. – Justicle Feb 28 '12 at 19:15