0

It looks like Android SDK's BitmapRegionDecoder uses Skia for decoding a part of the specified bitmap. Under the hood, it uses an appropriate code (jpeg, png etc) for the same. I'm looking at ways to optimize this using Renderscript.

Is it possible to define a Renderscript kernel function to ignore certain data from input allocation and save the rest in output allocation? I'm new to Renderscript and most of the kernel function tends to work on the entire input data set.

Vairavan
  • 1,246
  • 1
  • 15
  • 19

1 Answers1

2

Yes, use the LaunchOptions API to limit the rectangle that you launch over:

Script.LaunchOptions lo;
lo.setX(10, 100);
lo.setY(5, 20);
kernel.forEach(in, out, lo);

https://developer.android.com/reference/android/renderscript/Script.LaunchOptions.html

sakridge
  • 578
  • 3
  • 9
  • Thanks i tried it but it failed with "dimension mismatch for input and output allocation". I wanted an output allocation only for what was needed (after crop). I found some tips of binding input instead and it worked like a charm, blown away by performance improvements :-) – Vairavan Jun 28 '17 at 03:13
  • Yes, input/output needs to be the same size. For different size, do as you say use rs_allocation global for one of them and use rsGetElementAt_* functions. – sakridge Jun 28 '17 at 16:27
  • Not just that, forEach over the desired crop region (output allocation) is much more efficient than forEach over the entire input allocation and selecting the desired values, isn't it? Very much the case for a tiny crop in a huge image. – Vairavan Jun 29 '17 at 01:34
  • Is there a way to get stats of the rs operation? I'm curious as to which processor was used (CPU, GPU or DSP). – Vairavan Jul 04 '17 at 03:49
  • Take a look at Qualcomm Trepn profiler which can show CPU/GPU usage while you are doing a run. This can indicate which processor is used. – sakridge Jul 11 '17 at 13:38
  • Thanks, great tip. In my case, GPU isn't used at all and yet rs helped in Snapdragon 810. I'm assuming ARM's SIMD is helping out. As soon as i start rs, i notice all four CPUs are at a constant frequency ~1.5 KHz and the remaining 4 CPUs come down to 0.5 KHz and it stays there until the script completes. – Vairavan Jul 15 '17 at 15:44
  • Tried adding #pragma rs_fp_relaxed to your code? Also check that you are not using double type or rsDebug functions. – sakridge Jul 15 '17 at 16:32
  • It is still the same with rs_fp_relaxed (no GPU activity) and i'm just using uint, uchar and no debug functions. Is there any pragma to force it on to GPU just to see the performance? – Vairavan Jul 15 '17 at 21:05
  • No there isn't any way to force GPU from the app. Could be your kernel just runs better on CPU. If you can share more details about your device and your kernel code maybe I can have a better guess about why. – sakridge Jul 19 '17 at 09:20
  • uchar4 __attribute__((kernel)) crop(uint32_t x, uint32_t y) { int inX = x + startX; int inY = y + startY; return rsGetElementAt_uchar4( inAllocation, inX, inY ); } – Vairavan Jul 19 '17 at 14:25
  • I tried it on a Nexus 6P and basically bind input allocation, startX and startY is the top left coordinates of the desired crop region. – Vairavan Jul 19 '17 at 14:26
  • Your kernel is essentially a copy which is very memory bound. Since GPU and CPU have access to the same speed main memory I would expect the speed to be very similar between CPU and GPU in this case. – sakridge Jul 19 '17 at 16:09
  • Yes but wouldn't engaging multiple GPU cores result in a quick copy than just using 4 CPU cores? Even in case of CPU, it seems like just 4 cores are used and the remaining 4 just stay at 0.5 KHz. – Vairavan Jul 20 '17 at 04:07
  • Not really, both CPU and GPU are fast enough these days to max out the memory bandwidth of the SoC. Extra GPU cores will just be waiting around on the memory system waiting for data to come back. Also, how long is your kernel running for? Time it like: `time.start(); kernel_ForEach(): rs.finish() time.end(). time = end - start;` – sakridge Jul 20 '17 at 12:32
  • Interesting, for a crop region of 9290772 pixels, rs took 28 milli seconds. 22 milli seconds for 2840178 pixels and 16 for 369600 pixels. I guess, this is more than what i can ask for from rs. – Vairavan Jul 22 '17 at 16:41