Render script rendering is much slower than OpenGL rendering on Android

Question

BACKGROUND:

I want to add live filter based on the code of Android camera app. But the architecture of Android camera app is based on OpenGL ES 1.x. I need to use shader to custom our filter implementation. However, it is too difficult to update the camera app to OpenGL ES 2.0. Then I have to find some other methods to implement live filter instead of OpenGL. I decided to use render script after some research.

PROBLEM:

I have wrote a demo of a simple filter by render script. It shows that the fps is much lower than implementing it by OpenGL. About 5 fps vs 15 fps.

QUESTIONS:

The Android official offsite says: The RenderScript runtime will parallelize work across all processors available on a device, such as multi-core CPUs, GPUs, or DSPs, allowing you to focus on expressing algorithms rather than scheduling work or load balancing. Then why is render script implementation slower?
If render script cannot satisfy my requirement, is there a better way?

CODE DETAILS:

Hi I am in the same team with the questioner. We want to write a render-script based live-filter camera. In our test-demo-project, we use a simple filter: a YuvToRGB IntrinsicScript added with a overlay-filter ScriptC script. In the OpenGL version, we set the camera data as textures and do the image-filter-procss with shader. Like this:

    GLES20.glActiveTexture(GLES20.GL_TEXTURE0);
    GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, textureYHandle);
    GLES20.glUniform1i(shader.uniforms.get("uTextureY"), 0);
    GLES20.glTexSubImage2D(GLES20.GL_TEXTURE_2D, 0, 0, 0, mTextureWidth,
            mTextureHeight, GLES20.GL_LUMINANCE, GLES20.GL_UNSIGNED_BYTE,
            mPixelsYBuffer.position(0));

In the RenderScript version, we set the camera data as Allocation and do the image-filter-procss with script-kernals. Like this:

    // The belowing code is from onPreviewFrame(byte[] data, Camera camera) which gives the camera frame data 
    byte[] imageData = datas[0];
    long timeBegin = System.currentTimeMillis();
    mYUVInAllocation.copyFrom(imageData);

    mYuv.setInput(mYUVInAllocation);
    mYuv.forEach(mRGBAAllocationA);
    // To make sure the process of YUVtoRGBA has finished!
    mRGBAAllocationA.copyTo(mOutBitmap);    
    Log.e(TAG, "RS time: YUV to RGBA : " + String.valueOf((System.currentTimeMillis() - timeBegin)));   

    mLayerScript.forEach_overlay(mRGBAAllocationA, mRGBAAllocationB);
    mRGBAAllocationB.copyTo(mOutBitmap);    
    Log.e(TAG, "RS time: overlay : " + String.valueOf((System.currentTimeMillis() - timeBegin)));

    mCameraSurPreview.refresh(mOutBitmap, mCameraDisplayOrientation, timeBegin);

The two problems are : (1) RenderScript process seems slower than OpenGL process. (2) According to our time-log, the process of YUV to RGBA which uses intrinsic script is very quick, takes about 6ms; but the process of overlay which uses scriptC is very slow, takes about 180ms. How does this happen?

Here is the rs-kernal code of the ScriptC we use(mLayerScript):

#pragma version(1)
#pragma rs java_package_name(**.renderscript)
#pragma stateFragment(parent)

#include "rs_graphics.rsh"

static rs_allocation layer;
static uint32_t dimX;
static uint32_t dimY;

void setLayer(rs_allocation layer1) {
    layer = layer1;
}

void setBitmapDim(uint32_t dimX1, uint32_t dimY1) {
    dimX = dimX1;
    dimY = dimY1;
}

static float BlendOverlayf(float base, float blend) {
    return (base < 0.5 ? (2.0 * base * blend) : (1.0 - 2.0 * (1.0 - base) * (1.0 - blend)));
}

static float3 BlendOverlay(float3 base, float3 blend) {
    float3 blendOverLayPixel = {BlendOverlayf(base.r, blend.r), BlendOverlayf(base.g, blend.g), BlendOverlayf(base.b, blend.b)};
    return blendOverLayPixel;
}

uchar4 __attribute__((kernel)) overlay(uchar4 in, uint32_t x, uint32_t y) {
    float4 inPixel = rsUnpackColor8888(in);

    uint32_t layerDimX = rsAllocationGetDimX(layer);
    uint32_t layerDimY = rsAllocationGetDimY(layer);

    uint32_t layerX = x * layerDimX / dimX;
    uint32_t layerY = y * layerDimY / dimY;

    uchar4* p = (uchar4*)rsGetElementAt(layer, layerX, layerY);
    float4 layerPixel = rsUnpackColor8888(*p);

    float3 color = BlendOverlay(inPixel.rgb, layerPixel.rgb);

    float4 outf = {color.r, color.g, color.b, inPixel.a};
    uchar4 outc = rsPackColorTo8888(outf.r, outf.g, outf.b, outf.a);

    return outc;
}

Can you share how the code differs between the two version? I suspect the issue is getting the data from the camera into RS. — R. Jason Sams, Feb 15 '14 at 00:05
1. don't use rsAllocationGetDimX. pass those as globals (like dimX and dimY). 2. don't forget the f suffix on your constants. you're using double precision right now. 3. use rsGetElementAt_uchar4, not rsGetElementAt. 4. don't include rs_graphics.rsh, it's unnecessary. 5. consider caching layerDimX / DimX as a global (same with Y). 6. try #pragma rs_fp_relaxed, enables some additional optimizations if you don't care strict IEEE-754 compliance (NEON and some GPUs require relaxed). those are the highlights. — Tim Murray, Feb 18 '14 at 01:45
Tim got most of the high points, you can also use convert_uchar4() and convert_float4() if you do not requires the range rescale (0-255 vs 0-1) that rsPackColorTo8888() does. — R. Jason Sams, Feb 18 '14 at 22:34
Thanks, Tim and Jason. We will try to modify our code by your points. Do you have any articles on renderscript code optimization? It is a little difficult for us to google such articles. — James Zhao, Feb 21 '14 at 01:15

ClayMontgomery · Answer 1 · 2014-02-17T17:42:46.553

Renderscript does not use any GPU or DSPs cores. That is a common misconception encouraged by Google's deliberately vague documentation. Renderscript used to have an interface to OpenGL ES, but that has been deprecated and has never been used for much beyond animated wallpapers. Renderscript will use multiple CPU cores, if available, but I suspect Renderscript will be replaced by OpenCL.

Take a look at the Effects class and the Effects demo in the Android SDK. It shows how to use OpenGL ES 2.0 shaders to apply effects to images without writing OpenGL ES code.

http://software.intel.com/en-us/articles/porting-opengl-games-to-android-on-intel-atom-processors-part-1

UPDATE:

It's wonderful when I learn more answering a question than asking one and that is the case here. You can see from the lack of answers that Renderscript is hardly used outside of Google because of its strange architecture that ignores industry standards like OpenCL and almost non-existent documentation on how it actually works. Nonetheless, my answer did evoke a rare response from the Renderscrpt development team which includes only one link that actually contains any useful information about renderscript - this article by Alexandru Voica at IMG, the PowerVR GPU vendor:

http://withimagination.imgtec.com/index.php/powervr/running-renderscript-efficiently-with-powervr-gpus-on-android

That article has some good information which was new to me. There are comments posted there from more people who are having trouble getting Renderscript code to actually run on the GPU.

But, I was incorrect to assume that Renderscript is no longer being developed at Google. Although my statement that "Renderscript does not use any GPU or DSPs cores." was true until just fairly recently, I have learned that this has changed as of one of the Jelly Bean releases. It would have been great if one of the Renderscript developers could have explained that. Or even if they had a public webpage that explains that or that lists which GPUs are actually supported and how can you tell if your code actually gets run on a GPU.

My opinion is that Google will replace Renderscript with OpenCL eventually and I would not invest time developing with it.

"no GPU support" is totally false, every Nexus device currently on the market (and lots of other devices) are shipping with RS GPU drivers. — Tim Murray, Feb 14 '14 at 18:29
In order to do that, they would have to supply a GLSL shader compiler for every type and version of GPU used in the Nexus and even then the Renderscript code would not be portable to Android devices with different GPUs - which would break Google's guarantee of portability. — ClayMontgomery, Feb 14 '14 at 22:06
that is completely incorrect. RS and GLSL have nothing to do with each other; they are entirely separate user-mode driver stacks. RS bitcode can be run on the CPU for devices without GPU support or on the GPU when an appropriate GPU is present. a developer does not have to provide multiple source files or anything like that. (source: I work on the RS runtime, driver model, and API) — Tim Murray, Feb 15 '14 at 00:03
You keep implying things I did not write. What I am telling you is that running code on any GPU requires a compiler for said GPU. Somebody (either chip vendors or Google) has to supply those compilers for a range of GPU types and versions. RS code is not going to run on GPUs where the compilers are not available. What is "an appropriate GPU"? One that Google provides a compiler for? You have the benefit of having access to documentation on RS that those of us outside of Google do not. How about a public link to which GPU types RS actually supports? That would be helpful. — ClayMontgomery, Feb 15 '14 at 17:31
Links, easy. Arm http://www.arm.com/products/multimedia/mali-graphics-plus-gpu-compute/mali-t604.php?tab=Specifications Qct: http://www.qualcomm.com/media/blog/2013/01/11/inside-snapdragon-800-series-processors-new-adreno-330-gpu Img: http://withimagination.imgtec.com/index.php/powervr/running-renderscript-efficiently-with-powervr-gpus-on-android I can keep going but I am going to run out of characters quickly. — R. Jason Sams, Feb 15 '14 at 18:24

Render script rendering is much slower than OpenGL rendering on Android

1 Answers1

Linked