2

Our application has a fairly standard layout - a main window with a toolbar, source list on the left hand side, and a tab control. The contents of the tabs is drawn by a third party component that is open source and very complex. It makes extensive use of threads, etc.

Some users can reliably reproduce a situation where the whole application 'freezes' - which is to say, the main window stops redrawing. The cursor does not beachball, and the user can still click around, but the window simply does not update.

There is no specific repro steps other than starting to use the application, open one or more tabs and within a few minutes the problem occurs.

They are able to open the application menus at the top of the screen. One of the menu items - "Check for updates" - starts a modal session (a sheet with a progress indicator) while checking for updates on a background thread.

When the application is in this 'frozen' state, choosing 'Check for updates' seems to unfreeze the application for the duration of the modal session. Clicks and button presses that were made while frozen are suddenly registered as the UI redraws.

Then, the 'Check for updates' finishes, the modal session ends and the application is frozen again.

It seems to me to be a problem with the run loop. It's an open-ended question but I honestly don't know where to look - what can cause these kind of symptoms? I am unable to see any correlation between the hardware/software of the users that experience this issue.

EDIT


After freezing, the only thing I see on the console that I can't explain is

12/18/2013 10:39:29.000 AM kernel[0]: Network delay is not specified! Defaulting to 0x384

But I think this is a red herring.

A sample of the process when it's frozen shows this for the UI thread:

+                                 2327 -[NSApplication run]  (in AppKit) + 727  [0x9392e35c]
+                                   2327 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]  (in AppKit) + 119  [0x9393bad0]
+                                     2327 _DPSNextEvent  (in AppKit) + 1602  [0x9393c5a9]
+                                       2327 _BlockUntilNextEventMatchingListInModeWithFilter  (in HIToolbox) + 92  [0x9b2ea98d]
+                                         2327 ReceiveNextEventCommon  (in HIToolbox) + 526  [0x9b2eabb2]
+                                           2327 RunCurrentEventLoopInMode  (in HIToolbox) + 259  [0x9b2eae2d]
+                                             2327 CFRunLoopRunInMode  (in CoreFoundation) + 123  [0x95467bbb]
+                                               2327 CFRunLoopRunSpecific  (in CoreFoundation) + 394  [0x95467d5a]
+                                                 2324 __CFRunLoopRun  (in CoreFoundation) + 1393  [0x95468541]
+                                                 ! 2324 __CFRunLoopServiceMachPort  (in CoreFoundation) + 169  [0x95468f69]
+                                                 !   2324 mach_msg  (in libsystem_kernel.dylib) + 68  [0x99c0616c]
+                                                 !     2324 mach_msg_trap  (in libsystem_kernel.dylib) + 10  [0x99c06f7a]
+                                                 3 __CFRunLoopRun  (in CoreFoundation) + 1779  [0x954686c3]
+                                                   1 __CFRunLoopDoTimers  (in CoreFoundation) + 222  [0x9552b22e]
+                                                   : 1 CFArrayCreateMutable  (in CoreFoundation) + 138  [0x9540a02a]
+                                                   :   1 -[__NSPlaceholderArray initWithCapacity:]  (in CoreFoundation) + 136  [0x954005a8]
+                                                   :     1 +[__NSArrayM __new:::::]  (in CoreFoundation) + 798  [0x9540096e]
+                                                   1 __CFRunLoopDoTimers  (in CoreFoundation) + 237  [0x9552b23d]
+                                                   : 1 CFArrayAppendValue  (in CoreFoundation) + 157  [0x9541fa9d]
+                                                   :   1 -[__NSArrayM addObject:]  (in CoreFoundation) + 64  [0x954009e0]
+                                                   :     1 -[__NSArrayM insertObject:atIndex:]  (in CoreFoundation) + 400  [0x95400b90]
+                                                   :       1 _platform_bzero$VARIANT$sse42  (in libsystem_platform.dylib) + 51  [0x9660c233]
+                                                   1 __CFRunLoopDoTimers  (in CoreFoundation) + 349  [0x9552b2ad]
+                                                     1 __CFRunLoopDoTimer  (in CoreFoundation) + 1395  [0x954b0b43]
+                                                       1 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__  (in CoreFoundation) + 22  [0x954b1186]
+                                                         1 WebCore::timerFired(__CFRunLoopTimer*, void*)  (in WebCore) + 64  [0x9285bff0]
+                                                           1 WebCore::ThreadTimers::sharedTimerFired()  (in WebCore) + 84  [0x9285c064]
+                                                             1 WebCore::ThreadTimers::sharedTimerFiredInternal()  (in WebCore) + 330  [0x9285c1ca]
+                                                               1 WebCore::MainThreadSharedTimer::setFireInterval(double)  (in WebCore) + 21  [0x9285ba65]
+                                                                 1 WebCore::setSharedTimerFireInterval(double)  (in WebCore) + 173  [0x9285bb1d]
+                                                                   1 CFRunLoopAddTimer  (in CoreFoundation) + 453  [0x9546fed5]
+                                                                     1 CFSetApplyFunction  (in CoreFoundation) + 147  [0x954554d3]
+                                                                       1 CFBasicHashApply  (in CoreFoundation) + 122  [0x95437b4a]
+                                                                         1 __CFSetApplyFunction_block_invoke  (in CoreFoundation) + 25  [0x95455519]
+                                                                           1 __CFRunLoopAddItemToCommonModes  (in CoreFoundation) + 60  [0x9545556c]

But I am not able to interpret that information...

EDIT 2


An additional sample; clearly different to the first:

+                                 2352 -[NSApplication run]  (in AppKit) + 727  [0x9887035c]
+                                   2352 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]  (in AppKit) + 119  [0x9887dad0]
+                                     2352 _DPSNextEvent  (in AppKit) + 1602  [0x9887e5a9]
+                                       2352 _BlockUntilNextEventMatchingListInModeWithFilter  (in HIToolbox) + 92  [0x9428298d]
+                                         2352 ReceiveNextEventCommon  (in HIToolbox) + 526  [0x94282bb2]
+                                           2352 RunCurrentEventLoopInMode  (in HIToolbox) + 259  [0x94282e2d]
+                                             2352 CFRunLoopRunInMode  (in CoreFoundation) + 123  [0x930d6bbb]
+                                               2352 CFRunLoopRunSpecific  (in CoreFoundation) + 394  [0x930d6d5a]
+                                                 2332 __CFRunLoopRun  (in CoreFoundation) + 1393  [0x930d7541]
+                                                 ! 2332 __CFRunLoopServiceMachPort  (in CoreFoundation) + 169  [0x930d7f69]
+                                                 !   2332 mach_msg  (in libsystem_kernel.dylib) + 68  [0x91bc316c]
+                                                 !     2332 mach_msg_trap  (in libsystem_kernel.dylib) + 10  [0x91bc3f7a]
+                                                 17 __CFRunLoopRun  (in CoreFoundation) + 1779  [0x930d76c3]
+                                                 ! 16 __CFRunLoopDoTimers  (in CoreFoundation) + 349  [0x9319a2ad]
+                                                 ! : 10 __CFRunLoopDoTimer  (in CoreFoundation) + 1395  [0x9311fb43]
+                                                 ! : | 10 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__  (in CoreFoundation) + 22  [0x93120186]
+                                                 ! : |   4 __NSFireTimer  (in Foundation) + 117  [0x96cac78b]
+                                                 ! : |   + 4 -[ThirdPartyView nsTimerCallBack:]  (in ThirdPartyLib) + 61  [0x24958bd]
+                                                 ! : |   +   1 -[NSPasteboard changeCount]  (in AppKit) + 19  [0x98a44fc6]
+                                                 ! : |   +   ! 1 CFPasteboardGetGenerationCount  (in CoreFoundation) + 0  [0x930e3470]
+                                                 ! : |   +   1 -[NSPasteboard changeCount]  (in AppKit) + 0  [0x98a44fb3]
+                                                 ! : |   +   1 DYLD-STUB$$CFPasteboardGetGenerationCount  (in AppKit) + 0  [0x992f994c]
+                                                 ! : |   +   1 objc_msgSend  (in libobjc.A.dylib) + 26  [0x9b1094aa]
+                                                 ! : |   3 __NSFireTimer  (in Foundation) + 164  [0x96cac7ba]
+                                                 ! : |   + 3 _CFAutoreleasePoolPop  (in CoreFoundation) + 47  [0x9309bcdf]
+                                                 ! : |   +   3 (anonymous namespace)::AutoreleasePoolPage::pop(void*)  (in libobjc.A.dylib) + 559  [0x9b10b4b7]
+                                                 ! : |   2 __NSFireTimer  (in Foundation) + 98  [0x96cac778]
+                                                 ! : |   + 2 ???  (in <unknown binary>)  [0x3589ca0]
+                                                 ! : |   +   2 ???  (in <unknown binary>)  [0x99575bc]
+                                                 ! : |   +     2 ???  (in <unknown binary>)  [0x9957668]
+                                                 ! : |   +       1 _objc_rootRetainCount  (in libobjc.A.dylib) + 88  [0x9b10bdd0]
+                                                 ! : |   +       ! 1 objc::DenseMapBase<objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*> >, objc_object*, unsigned long, objc::DenseMapInfo<objc_object*>, true>::find(objc_object* const&)  (in libobjc.A.dylib) + 32  [0x9b11f820]
+                                                 ! : |   +       !   1 bool objc::DenseMapBase<objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*> >, objc_object*, unsigned long, objc::DenseMapInfo<objc_object*>, true>::LookupBucketFor<objc_object*>(objc_object* const&, std::__1::pair<objc_object*, unsigned long> const*&) const  (in libobjc.A.dylib) + 73  [0x9b11f9ff]
+                                                 ! : |   +       1 objc_msgSend  (in libobjc.A.dylib) + 64  [0x9b1094d0]
+                                                 ! : |   1 __NSFireTimer  (in Foundation) + 53  [0x96cac74b]
+                                                 ! : |     1 NSPushAutoreleasePool  (in Foundation) + 0  [0x96c4a188]
+                                                 ! : 4 __CFRunLoopDoTimer  (in CoreFoundation) + 557  [0x9311f7fd]
+                                                 ! : | 3 __CFArmNextTimerInMode  (in CoreFoundation) + 773  [0x930df535]
+                                                 ! : | + 3 mk_timer_arm  (in libsystem_kernel.dylib) + 10  [0x91bc40ca]
+                                                 ! : | 1 __CFArmNextTimerInMode  (in CoreFoundation) + 554  [0x930df45a]
+                                                 ! : |   1 __CFTSRToNanoseconds  (in CoreFoundation) + 25  [0x931bedb9]
+                                                 ! : 1 __CFRunLoopDoTimer  (in CoreFoundation) + 1556  [0x9311fbe4]
+                                                 ! : | 1 pthread_mutex_lock  (in libsystem_pthread.dylib) + 16  [0x911ca7ac]
+                                                 ! : |   1 _pthread_mutex_lock  (in libsystem_pthread.dylib) + 181  [0x911ca866]
+                                                 ! : 1 __CFRunLoopDoTimer  (in CoreFoundation) + 2093  [0x9311fdfd]
+                                                 ! :   1 __CFRunLoopFindMode  (in CoreFoundation) + 174  [0x930b1b5e]
+                                                 ! :     1 _CFRuntimeSetInstanceTypeIDAndIsa  (in CoreFoundation) + 33  [0x93065711]
+                                                 ! :       1 _CFRuntimeSetInstanceTypeID  (in CoreFoundation) + 0  [0x93063f50]
+                                                 ! 1 __CFRunLoopDoTimers  (in CoreFoundation) + 237  [0x9319a23d]
+                                                 !   1 CFArrayAppendValue  (in CoreFoundation) + 157  [0x9308ea9d]
+                                                 !     1 -[__NSArrayM addObject:]  (in CoreFoundation) + 64  [0x9306f9e0]
+                                                 !       1 -[__NSArrayM insertObject:atIndex:]  (in CoreFoundation) + 400  [0x9306fb90]
+                                                 !         1 _platform_bzero$VARIANT$sse42  (in libsystem_platform.dylib) + 0  [0x94d2d200]
+                                                 2 __CFRunLoopRun  (in CoreFoundation) + 1601  [0x930d7611]
+                                                 ! 1 __CFRunLoopDoObservers  (in CoreFoundation) + 81  [0x930e63b1]
+                                                 ! : 1 CFArrayGetCount  (in CoreFoundation) + 20  [0x930797f4]
+                                                 ! 1 __CFRunLoopDoObservers  (in CoreFoundation) + 194  [0x930e6422]
+                                                 !   1 CFArrayGetValueAtIndex  (in CoreFoundation) + 0  [0x9308ef80]
+                                                 1 __CFRunLoopRun  (in CoreFoundation) + 1550  [0x930d75de]
+                                                   1 mach_port_extract_member  (in libsystem_kernel.dylib) + 31  [0x91bc7363]
+                                                     1 _kernelrpc_mach_port_extract_member_trap  (in libsystem_kernel.dylib) + 10  [0x91bc3f26]
TheNextman
  • 12,428
  • 2
  • 36
  • 75
  • 1
    Can _you_ reproduce it? What kind of info does sampling in Instruments provide? – jscs Dec 17 '13 at 20:46
  • I'm unable to reproduce it locally. I could potentially ask a user to run Instruments against a debug build but I'm not sure what to sample that could be useful – TheNextman Dec 17 '13 at 20:47
  • 1
    Have you gotten system details from the users who are experiencing the problem? They might all be using SIMBL or something funky like that. – jscs Dec 17 '13 at 20:49
  • I have an .spx file from one user and I'm trying to get others to see if I can draw a correlation. It's interesting that the one user is also the same seeing the problem in my other thread here: http://stackoverflow.com/questions/20332171/cocoa-app-tries-to-launch-lldb-on-crash – TheNextman Dec 17 '13 at 20:54
  • Sadly looking through the installed software, processes and extensions on their machine is fairly fruitless as I have no way of distinguishing perfectly normal processes from those that could cause these problems – TheNextman Dec 17 '13 at 20:55
  • 1
    If you use threads, do you come back to the main thread during UI updates...? – Stephen J Dec 17 '13 at 21:45
  • As far as I'm aware we do; but not all the code is mine and the third-party code base is large. Is there an easy way to audit for this? I should also note that the third-party code is mainly C and using pthreads; not NSThread – TheNextman Dec 17 '13 at 22:45
  • 1
    Have the user check the console log to see if any exceptions were logged at the time the app went unresponsive. Have the user use Activity Monitor to sample the unresponsive app process and send you the sample report. That will show where your threads are. Ideally, they would do that with an unstripped build, if you can provide them one. – Ken Thomases Dec 18 '13 at 01:10
  • @KenThomases I added a sample to my question, but it doesn't mean much to me – TheNextman Dec 18 '13 at 16:26
  • That sample report is clearly not showing everything. In particular, the `-[NSApplication run]` frame seems to be much deeper than normal. Is it nested within an outer invocation of that same method? Also, that is on the main thread, right? – Ken Thomases Dec 18 '13 at 17:38
  • @KenThomases This is the main thread. The app is using mono and there was a bunch of mono_jit calls on top of the stack that I edited out - but there are no nested invocations of the same method. I posted another sample which is clearly different to the first, but captured under the same circumstances. – TheNextman Dec 18 '13 at 19:20
  • Both sample reports seem to indicate an app that's idle, ready, and waiting to process input. That is, they look normal. The differences are incidental. Going back to the description of the symptoms, the difference when running the modal dialog is the mode in which the run loop is running, `NSModalPanelRunLoopMode` vs. `NSDefaultRunLoopMode`. It's possible the third-party component is scheduling a timer or other run-loop source in the wrong mode, so that it doesn't get service except when a modal dialog is run. For example, it might use `-[NSRunLoop currentMode]` during setup when it shouldn't – Ken Thomases Dec 18 '13 at 19:57
  • That was my line of thinking, based on the behaviour (and the apparent unblocking of the UI when a modal session is running). However grepping the third party code shows only one instance of NSTimer; and it's created on the main run loop in default mode :S – TheNextman Dec 18 '13 at 20:10
  • Did you also grep for Core Foundation run-loop stuff? – Ken Thomases Dec 18 '13 at 21:08
  • Nothing Core Foundation at all... – TheNextman Dec 19 '13 at 14:20

0 Answers0