2

I have an Android app published on Google Play that is implemented in large part in RenderScript (native, not using support library APIs). The app sometimes seems to crash in libCB.so. Crash rate is 1.40% as reported by the Google Play Console.

The crash seems to occur in all Android versions from 6.0 to 8.1 (API levels 23–27). I haven't received a report from older versions even though the app's minSdkVersion is 18 (Android 4.3). All kinds of devices from various manufacturers seem to be affected, both cheap no-name products as well as hi-end devices.

The app uses the Camera 1 API to capture frames from a live preview video (setPreviewCallbackWithBuffer). The PreviewCallback sends the frame data through a series of RenderScripts that process that input. At two stages the processed data is then also sent to two different TextureViews. I can provide more details if necessary.

I'm not sure what causes the problem since I was not able to reproduce it locally on any of my own test devices.

Does anyone know what might be the cause of this issue or if there is any workaround?

Here's a typical backtrace:

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'Xiaomi/land/land:6.0.1/MMB29M/V9.2.2.0.MALMIEK:user/release-keys'
Revision: '0'
ABI: 'arm'
pid: 17840, tid: 17862, name: JNISurfaceTextu >>> com.app.my <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x38
r0 00000000 r1 00000000 r2 00000002 r3 00000000
r4 00000000 r5 ef8e0a38 r6 00000002 r7 00000000
r8 ab06e808 r9 00000000 sl 00000000 fp ef8e0c30
ip e044d8f8 sp ef8e09f8 lr e03ac9b3 pc e03ac89c cpsr 800f0030

backtrace:
#00 pc 0003189c /system/vendor/lib/libCB.so (cl_mem_non_local_event_cache_state_transition+15)
#01 pc 000319af /system/vendor/lib/libCB.so (cl_mem_grant_access_to_device_internal+58)
#02 pc 00031ae5 /system/vendor/lib/libCB.so (cb_grant_access_to_device+84)
#03 pc 0000eb61 /system/vendor/lib/librs_adreno.so
#04 pc 0000683b /system/vendor/lib/librs_adreno.so
#05 pc 000068a1 /system/vendor/lib/librs_adreno.so
#06 pc 00007f33 /system/vendor/lib/librs_adreno.so
#07 pc 00009707 /system/vendor/lib/librs_adreno.so
#08 pc 00009ee5 /system/vendor/lib/librs_adreno.so
#09 pc 000088f5 /system/vendor/lib/librs_adreno.so (rsdVendorScriptInvokeForEach3+236)
#10 pc 00019bf9 /system/vendor/lib/libRSDriver_adreno.so (_Z29rsdVendorInvokeForEachWrapperPKN7android12renderscript7ContextEPNS0_6ScriptEjPPKNS0_10AllocationEjPS6_PKvjPK12RsScriptCall+84)
#11 pc 00019cfb /system/vendor/lib/libRSDriver_adreno.so (_Z27rsdScriptInvokeForEachMultiPKN7android12renderscript7ContextEPNS0_6ScriptEjPPKNS0_10AllocationEjPS6_PKvjPK12RsScriptCall+38)
#12 pc 0002ebf3 /system/lib/libRS.so (_ZN7android12renderscript7ScriptC10runForEachEPNS0_7ContextEjPPKNS0_10AllocationEjPS4_PKvjPK12RsScriptCall+294)
#13 pc 00033e41 /system/lib/libRS.so (_ZN7android12renderscript22rsp_ScriptForEachMultiEPNS0_7ContextEPKvj+48)
#14 pc 000311ff /system/lib/libRS.so (_ZN7android12renderscript8ThreadIO16playCoreCommandsEPNS0_7ContextEi+338)
#15 pc 00023d27 /system/lib/libRS.so (_ZN7android12renderscript7Context10threadProcEPv+646)
#16 pc 0004185b /system/lib/libc.so (_ZL15__pthread_startPv+30)
#17 pc 000192a5 /system/lib/libc.so (__start_thread+6)
devconsole
  • 7,875
  • 1
  • 34
  • 42

2 Answers2

1

Well, we had a similar issue with opencv c++ code compiled for Android.

Our mistake was setting timers which call jni functions and forgetting them to purge afterwards. Eventually garbage collector wasn't able to follow the references which causes memory leak. So our app used to crash unless it was killed by the user for like, maybe a whole day.

Are you setting timers anywhere in your code?

Or are you keeping the track of the threads (if anyone of them exists) which call jni functions?

You might want to check this if you are interested: Timer::purge()

Burak Day
  • 907
  • 14
  • 28
  • Hmm, I'm not sure, I don't call any jni functions directly, it's RenderScript code. I call both RenderScript.destroy() and HandlerThread.quit() when the job is done. But you may be on to something, I haven't thought of a possible out-of-memory issue, maybe "grant_access_to_device" misled me. I will try to monitor memory consumption over a longer period of time and report my findings. – devconsole Apr 12 '18 at 17:39
  • Okay, I ran some tests and it does not look like an out-of-memory issue. I started and destroyed the Activity repeatedly, the process stayed alive. While native memory usage initially went up it plateaus at about 10 MB. The number of RenderScript instances is 1, the number of Allocation instances is constant. It does not make a difference if I manually destroy() the Allocations or wait for the finalizer to to it. – devconsole Apr 13 '18 at 07:19
  • Yea well then it shouldn't be about references and memory leak stuff. And you are saying that crashes happen on versions vary between 6.0 an 8.1. No crashes < 6.0 at all? That's interesting. We could have easily said that "there's a tiny little unhandled case on the native code" if crashes were happening on all versions. – Burak Day Apr 13 '18 at 08:05
  • I mean, in the end it's a segmentation fault which indicates that the native code is trying to access non-existing memory or memory space which is not granted to itself. At least from my point of view, it seems like it's not your fault but the native codes. Any idea about libCB versions on those devices? Maybe they upgraded the library version after 5.0 or something. So that's how you don't see crashes on < 6.0 – Burak Day Apr 13 '18 at 08:13
  • 1
    Yes, there are hundreds of reports from 6.0 and later and not a single one from 4.4 to 5.1. Like Miao Wang said it could well be a bug in the Qualcomm driver. – devconsole Apr 13 '18 at 08:14
  • Yea to me it's an unhandled case in the library caused by version difference on the devices. Hope you find a way to fix it though. – Burak Day Apr 13 '18 at 08:16
1

Are the crashes mostly related to libCB.so? If so, it could be a bug in Qualcomm RenderScript driver. I suspect it only affects certain SOC.

Please file a bug in https://buganizer.corp.google.com/issues?q=componentid:192772, and attach the potential steps to reproduce this.

Miao Wang
  • 1,120
  • 9
  • 12
  • Yes, the crashes are all related to libCB.so. Specifically in grant_access_to_device, mem_grant_access_to_device_internal and mem_non_local_event_cache_state_transition as shown above. I have no idea how to reproduce this though. – devconsole Apr 13 '18 at 07:25
  • That buganizer page asks me to sign in with an @google.com account. – devconsole Apr 13 '18 at 07:28