Crash及其分析技巧的一点心得

这篇文档用于总结日常工作中如何分析Crash Report的一些思路,首先我们可以从官方文档得到一些关于Crash Report的基本情况。

一份完整的Crash Report(已符号化过)中包含的几部分信息中,头部描述信息线程调用堆栈是我们最为关注的两个部分。

通过这两部分可以获知这次崩溃现场的触发原因和线程回溯,然后通过这些信息我们就可以开始动手修复这份Crash了。接下来先来看几类在工作中遇到的Crash修复示例。

未捕获异常(EXC_CRASH)

1
2
3
Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY

头部信息中告知是一个未捕获的异常崩溃,是由于Objective-C/C/C++的Exception或者调用Abort()引起的崩溃。这类问题也是线上/自动化Crash最为常见的一类,需要仔细分析才能看出端倪。比如下面:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
Exception Type:  EXC_CRASH (SIGKILL)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Termination Reason: Namespace ASSERTIOND, Code 0xbada5e47
Triggered by Thread: 0

Filtered syslog:
None found

hread 0 name: Dispatch queue: com.apple.main-thread
Thread 0 Crashed:
0 libsystem_kernel.dylib 0x000000018377c138 __psynch_mutexwait + 8
1 libsystem_pthread.dylib 0x0000000183897960 _pthread_mutex_lock_wait + 96
2 libsystem_pthread.dylib 0x00000001838978a4 _pthread_mutex_lock_slow$VARIANT$armv81 + 244
3 UIKit 0x000000018d1971f8 _addBackgroundTask + 540
4 UIKit 0x000000018d3c97f0 -[UIApplication _beginBackgroundTaskWithExpirationHandler:] + 24
5 UIKit 0x000000018d19f844 -[UIApplication beginBackgroundTaskWithExpirationHandler:] + 72
6 MTHawkeyeInjection 0x0000000107f9c610 __66+[MTHBackgroundTaskObserver injectIntoUIAPPlicationBackgroundTask]_block_invoke_2 + 34320 (MTHBackgroundTaskObserver.m:43)
7 XXXX 0x0000000104822c90 -[ALSAuxiliaryGainEffect setDoing] + 152
8 CoreFoundation 0x0000000183bfc33c __CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ + 20
9 CoreFoundation 0x0000000183bfb8dc _CFXRegistrationPost + 420
10 CoreFoundation 0x0000000183bfb640 ___CFXNotificationPost_block_invoke + 60
11 CoreFoundation 0x0000000183c79024 -[_CFXNotificationRegistrar find:object:observer:enumerator:] + 1408
12 CoreFoundation 0x0000000183b31f60 _CFXNotificationPost + 380
13 Foundation 0x000000018455f348 -[NSNotificationCenter postNotificationName:object:userInfo:] + 68
14 UIKit 0x000000018d3b5b30 __47-[UIApplication _applicationDidEnterBackground]_block_invoke + 268
15 UIKit 0x000000018d4c1ba8 +[UIViewController _performWithoutDeferringTransitions:] + 128
16 UIKit 0x000000018d3b5988 -[UIApplication _applicationDidEnterBackground] + 112
17 UIKit 0x000000018d64d20c -[__UICanvasLifecycleMonitor_Compatability deactivateEventsOnly:withContext:forceExit:completion:] + 996
18 UIKit 0x000000018ddd1848 __82-[_UIApplicationCanvas _transitionLifecycleStateWithTransitionContext:completion:]_block_invoke + 380
19 UIKit 0x000000018ddd1674 -[_UIApplicationCanvas _transitionLifecycleStateWithTransitionContext:completion:] + 448
20 UIKit 0x000000018db3f2dc __125-[_UICanvasLifecycleSettingsDiffAction performActionsForCanvas:withUpdatedScene:settingsDiff:fromSettings:transitionContext:]_block_invoke + 220
21 UIKit 0x000000018dcd83dc _performActionsWithDelayForTransitionContext + 112
22 UIKit 0x000000018db3f18c -[_UICanvasLifecycleSettingsDiffAction performActionsForCanvas:withUpdatedScene:settingsDiff:fromSettings:transitionContext:] + 252
23 UIKit 0x000000018d92378c -[_UICanvas scene:didUpdateWithDiff:transitionContext:completion:] + 364
24 UIKit 0x000000018d7c52d4 -[UIApplicationSceneClientAgent scene:handleEvent:withCompletion:] + 468
25 FrontBoardServices 0x000000018632eca4 __80-[FBSSceneImpl updater:didUpdateSettings:withDiff:transitionContext:completion:]_block_invoke.362 + 212
26 libdispatch.dylib 0x00000001835e6a14 _dispatch_client_callout + 16
27 libdispatch.dylib 0x00000001836229c4 _dispatch_block_invoke_direct$VARIANT$armv81 + 280
28 FrontBoardServices 0x00000001863627f8 __FBSSERIALQUEUE_IS_CALLING_OUT_TO_A_BLOCK__ + 36
29 FrontBoardServices 0x000000018636249c -[FBSSerialQueue _performNext] + 404
30 FrontBoardServices 0x0000000186362a38 -[FBSSerialQueue _performNextFromRunLoopSource] + 56
31 CoreFoundation 0x0000000183c1297c __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 24
32 CoreFoundation 0x0000000183c128fc __CFRunLoopDoSource0 + 88
33 CoreFoundation 0x0000000183c12184 __CFRunLoopDoSources0 + 204
34 CoreFoundation 0x0000000183c0fd5c __CFRunLoopRun + 1048
35 CoreFoundation 0x0000000183b2fe58 CFRunLoopRunSpecific + 436
36 GraphicsServices 0x00000001859dcf84 GSEventRunModal + 100
37 UIKit 0x000000018d1af67c UIApplicationMain + 236
38 XXXXX 0x0000000102919690 main + 71312 (main.m:16)
39 libdyld.dylib 0x000000018364c56c start + 4

这是一个比较有意思的Crash,在头部信息中Termination Reason: Namespace ASSERTIOND, Code 0xbada5e47这句话就可以发现这个问题的具体原因是由于后台任务过多导致的崩溃,对于这种Crash只能去调整相关业务代码才能避免,总的方向就是需要去降低相关业务的后台申请数量。


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Exception Type:  EXC_CRASH (SIGKILL)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Termination Reason: Namespace SPRINGBOARD, Code 0x8badf00d
Termination Description: SPRINGBOARD, scene-update watchdog.....

Thread 0 name: Dispatch queue: com.apple.main-thread
Thread 0 Crashed:
0 libsystem_kernel.dylib 0x0000000183a69be4 __ulock_wait + 8
1 libdispatch.dylib 0x00000001838d741c _dispatch_unfair_lock_wait + 48
2 libdispatch.dylib 0x00000001838d7614 _dispatch_gate_wait_slow$VARIANT$mp + 132
3 libdispatch.dylib 0x00000001838d8334 dispatch_once_f$VARIANT$mp + 132
4 XXXX 0x0000000104ea8204 +[XXXXX sharedManager] + 1262084 (once.h:84)

........

Thread 9 name: Dispatch queue: com.apple.root.default-qos
Thread 9:
0 libsystem_kernel.dylib 0x0000000183a69be4 __ulock_wait + 8
1 libdispatch.dylib 0x00000001838d7198 _dispatch_ulock_wait + 48
2 libdispatch.dylib 0x00000001838d72f8 _dispatch_thread_event_wait_slow$VARIANT$mp + 44
3 libdispatch.dylib 0x00000001838e2bbc _dispatch_sync_wait + 452
4 CoreData 0x000000018697c3b4 _perform + 268
5 CoreData 0x000000018698dd7c -[NSManagedObjectContext+ 953724 (_NestedContextSupport) managedObjectContextDidRegisterObjectsWithIDs:generation:] + 72
6 CoreData 0x00000001868d4018 _PFFaultHandlerLookupRow + 564
7 CoreData 0x00000001868d39f4 _PF_FulfillDeferredFault + 244
8 CoreData 0x00000001868d3804 _sharedIMPL_pvfk_core + 116
9 XXXX
10 XXXX
11 XXXX
12 XXXX
13 libdispatch.dylib 0x00000001838d4ae4 _dispatch_client_callout + 16
14 libdispatch.dylib 0x00000001838d82ec dispatch_once_f$VARIANT$mp + 60
15 XXXX 0x0000000104ea8204 +[XXXXX sharedManager] + 1262084 (once.h:84)

这个崩溃显然触发了看门狗超时,这种类型的问题出现的一般原因是主线程中存在死锁路径。比如这个例子就是由于dispatch_once的底层实现策略实际上存在不安全情况,同时在并发调用dispatch_once初始化过程可能会出现阻塞主线程导致死锁情况。


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Exception Type:  00000020
Exception Codes: 0x000000008badf00d
Exception Note: SIMULATED (this is NOT a crash)
Highlighted by Thread: 0

Application Specific Information: XXXX failed to scene-update after 10.00s
Elapsed total CPU time (seconds): 6.430 (user 6.430, system 0.000), 32% CPU
Elapsed application CPU time (seconds): 4.644, 23% CPU

Thread 0 name: Dispatch queue: com.apple.main-thread
Thread 0:
0 libsystem_kernel.dylib 0x0000000182ad502c semaphore_timedwait_trap + 8
1 libdispatch.dylib 0x00000001829b2394 _dispatch_semaphore_wait_slow + 160
2 QuartzCore 0x0000000185b86b40 -[CAMetalLayer nextDrawable] + 984
3 MetalKit 0x0000000192d6e688 -[MTKView currentDrawable] + 80
4 XXXX 0x00000001003891d8 -[xxxxx drawInMTKView:] + 3232216 (xxxx.m:0)
5 MetalKit 0x0000000192d6f67c -[MTKView draw] + 208
6 XXXX 0x0000000100389004 -[xxxxxxx showEffect] + 3231748 (xxxx.m:323)
7 XXXX 0x0000000100387270 -[xxxxx layoutSubviews] + 3224176 (xxxx.m:96)
8 UIKit 0x00000001880b01e4 -[UIView+ 66020 (CALayerDelegate) layoutSublayersOfLayer:] + 656
9 QuartzCore 0x0000000185a4298c -[CALayer layoutSublayers] + 148
10 QuartzCore 0x0000000185a3d5c8 CA::Layer::layout_if_needed+ 38344 (CA::Transaction*) + 292
11 QuartzCore 0x0000000185a3d488 CA::Layer::layout_and_display_if_needed+ 38024 (CA::Transaction*) + 32
12 QuartzCore 0x0000000185a3cab8 CA::Context::commit_transaction+ 35512 (CA::Transaction*) + 252
13 QuartzCore 0x0000000185a3c818 CA::Transaction::commit+ 34840 () + 500
14 QuartzCore 0x0000000185a35ddc CA::Transaction::observer_callback+ 7644 (__CFRunLoopObserver*, unsigned long, void*) + 80
15 CoreFoundation 0x0000000182f0c728 __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 32
16 CoreFoundation 0x0000000182f0a4cc __CFRunLoopDoObservers + 372
17 CoreFoundation 0x0000000182f0a8fc __CFRunLoopRun + 928
18 CoreFoundation 0x0000000182e34c50 CFRunLoopRunSpecific + 384
19 GraphicsServices 0x000000018471c088 GSEventRunModal + 180
20 UIKit 0x000000018811e088 UIApplicationMain + 204
21 XXXXX 0x000000010007b6e0 main + 30432 (main.m:18)
22 libdyld.dylib 0x00000001829d28b8 start + 4

这是一个比较有意思的崩溃,原因上写着SIMULATED (this is NOT a crash)表明并不像上面的主线程阻塞死锁导致的。分析完崩溃上下文并在Xcode动态联调起来后发现是由于主线程代码上存在逻辑循环,在父控件的layoutSubviews里面绘制子控件,同时子控件的绘制动作又会触发父控件的layoutSubviews导致主线程处于逻辑循环状态,表现上是界面卡死最终导致崩溃现象。


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Triggered by Thread: 25

Application Specific Information:
abort() called

Thread 25 Crashed:
0 libsystem_kernel.dylib 0x00000001caf7f0e4 __pthread_kill + 8
1 libsystem_c.dylib 0x00000001caed7074 __abort + 156
2 libsystem_c.dylib 0x00000001caed6fd8 __abort + 0
3 libsystem_malloc.dylib 0x00000001cafd3c7c _malloc_put + 0
4 libsystem_malloc.dylib 0x00000001cafd3ec0 malloc_zone_error + 108
5 libsystem_malloc.dylib 0x00000001cafd3758 free_list_checksum_botch + 40
6 libsystem_malloc.dylib 0x00000001cafd27f0 tiny_malloc_from_free_list + 1316
7 libsystem_malloc.dylib 0x00000001cafd0728 tiny_malloc_should_clear + 252
8 libsystem_malloc.dylib 0x00000001cafb959c szone_malloc_should_clear + 80
9 libsystem_malloc.dylib 0x00000001cafc10bc nanov2_calloc + 168
10 libsystem_malloc.dylib 0x00000001cafc6f14 default_zone_calloc + 84
11 libsystem_malloc.dylib 0x00000001cafc4d88 malloc_zone_calloc + 148
12 libsystem_malloc.dylib 0x00000001cafc5694 calloc + 44
13 libobjc.A.dylib 0x00000001ca5fa4f4 class_createInstance + 76
14 libobjc.A.dylib 0x00000001ca60512c _objc_rootAlloc + 56
15 Photos 0x00000001da417ad4 -[PHPhotoLibrary fetchPHObjectsForOIDs:propertyHint:includeTrash:] + 476
16 Photos 0x00000001da4c62ac -[PHBatchFetchingArray _phObjectsForOIDs:] + 240
17 Photos 0x00000001da4c6774 -[PHBatchFetchingArray __batchHelper:] + 344
18 Photos 0x00000001da4c698c __41-[PHBatchFetchingArray _phObjectAtIndex:]_block_invoke + 40
19 AssetsLibraryServices 0x00000001d90a2b3c __pl_dispatch_sync_block_invoke + 44
20 libdispatch.dylib 0x00000001cadf830c _dispatch_client_callout + 20
21 libdispatch.dylib 0x00000001cae04cb4 _dispatch_lane_barrier_sync_invoke_and_complete + 60
22 AssetsLibraryServices 0x00000001d90a2b04 pl_dispatch_sync + 84
23 Photos 0x00000001da4c68f4 -[PHBatchFetchingArray _phObjectAtIndex:] + 168
24 Photos 0x00000001da4c5a1c -[PHBatchFetchingArray objectAtIndex:] + 48
25 Photos 0x00000001da4e7b00 -[PHFetchResult lastObject] + 52

这个崩溃从堆栈上看比较奇怪,是在系统进行malloc内存分配的时候出现了异常触发了abort()函数。同时这段业务代码调用并不存在可能触发这类异常的逻辑,所以可以大胆猜测这份Crash实际上堆栈已经被破坏了(Heap corruption),最重要的是堆错误的时候可能是在指针出现损坏(野指针)后很久才会被访问到导致崩溃。所以这份奇怪的Crash实际上并非真正导致崩溃的现场,这种情况就需要借助其他内存排查工具去解决。

内存访问错误(EXC_BAD_ACCESS)

1
2
Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x0000000000000000

当进程试图访问无效的内存(野指针释放、访问错误类型指针)或试图以内存的保护级别所不允许的方式去访问内存(例如写入到只读存储器)的时候会触发内存访问错误。这种现象也被称作堆栈错误(Heap Corruption),同时这份Crash堆栈可能并非是导致这个错误的直接原因,只是正好它访问了这个无效内存导致Crash。

当出现这种情况我们可能需要到诸如:Zombies Instrument、Xcode Zombie Object(僵尸对象)、Address Sanitizer(地址消毒剂)、Thread Sanitizer(多线程资源访问检测)等工具来排查问题。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x0000000000000010
VM Region Info: 0x10 is not in any region. Bytes before following region: 4307648496
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
UNUSED SPACE AT START
--->
__TEXT 0000000100c18000-0000000105820000 [ 76.0M] r-x/r-x SM=COW

Termination Signal: Segmentation fault: 11
Termination Reason: Namespace SIGNAL, Code 0xb
Terminating Process: exc handler [0]
Triggered by Thread: 29

....

Thread 29 name: Dispatch queue: com.meitu.hawkeye.status_flush
Thread 29 Crashed:
0 libobjc.A.dylib 0x000000018170216c cache_t::find+ 24940 (unsigned long, objc_object*) + 20
1 libobjc.A.dylib 0x00000001817022f0 cache_fill + 288
2 libobjc.A.dylib 0x000000018170db24 lookUpImpOrForward + 496
3 libobjc.A.dylib 0x0000000181718c38 _objc_msgSend_uncached + 56
4 CoreFoundation 0x000000018254acd0 __exceptionPreprocess + 40
5 libobjc.A.dylib 0x00000001817045ec objc_exception_throw + 56
6 Foundation 0x0000000182f76d7c _NSErrnoMessage + 0
7 CoreFoundation 0x0000000182512a70 __CFStringChangeSizeMultiple + 3044
8 CoreFoundation 0x000000018250ef00 __CFStringAppendBytes + 640
9 CoreFoundation 0x000000018250a0d4 __CFStringAppendFormatCore + 14052
10 CoreFoundation 0x000000018250a740 _CFStringCreateWithFormatAndArgumentsAux2 + 132
11 Foundation 0x0000000182ea3f5c -[NSString initWithFormat:locale:] + 36
12 Foundation 0x0000000182ea3ed4 -[NSNumber descriptionWithLocale:] + 1044
13 CoreFoundation 0x000000018245ed84 -[__NSCFNumber descriptionWithLocale:] + 72
14 Foundation 0x0000000182ed2168 _NSDescriptionWithLocaleFunc + 76
15 CoreFoundation 0x0000000182508fd8 __CFStringAppendFormatCore + 9704
16 CoreFoundation 0x000000018250a740 _CFStringCreateWithFormatAndArgumentsAux2 + 132
17 Foundation 0x0000000182e84b88 +[NSString stringWithFormat:] + 68
18 MTHawkeyeInjection 0x0000000108db63c8 -[PodMTHawkeyeInjection_MTHawkeyeClient doBuildInFlushStatusTasks] + 132
19 MTHawkeyeInjection 0x0000000108db62f4 -[PodMTHawkeyeInjection_MTHawkeyeClient statusFlushTimerFired] + 28
20 MTHawkeyeInjection 0x0000000108db6228 __62-[PodMTHawkeyeInjection_MTHawkeyeClient startStatusFlushTimer]_block_invoke + 40
21 libdispatch.dylib 0x0000000181e3cae4 _dispatch_client_callout + 16
22 libdispatch.dylib 0x0000000181e797a8 _dispatch_continuation_pop$VARIANT$armv81 + 416
23 libdispatch.dylib 0x0000000181e82c20 _dispatch_source_invoke$VARIANT$armv81 + 1248
24 libdispatch.dylib 0x0000000181e7b074 _dispatch_queue_serial_drain$VARIANT$armv81 + 248
25 libdispatch.dylib 0x0000000181e7bad8 _dispatch_queue_invoke$VARIANT$armv81 + 328
26 libdispatch.dylib 0x0000000181e7c47c _dispatch_root_queue_drain_deferred_wlh$VARIANT$armv81 + 332
27 libdispatch.dylib 0x0000000181e8444c _dispatch_workloop_worker_thread$VARIANT$armv81 + 612
28 libsystem_pthread.dylib 0x000000018216fe70 _pthread_wqthread + 860
29 libsystem_pthread.dylib 0x000000018216fb08 start_wqthread + 4

这个Crash的Exception Subtype部分告诉我们这是一个由访问野指针造成的崩溃。同时我们再看一看它的崩溃线程堆栈,在这一段业务代码逻辑的[NSString stringWithFormat:]是在ARC环境中栈对象,所以理论上并不是它产生了这个野指针。这时候要解决这个Crash就必须要借助Address Sanitizer或者Zombie Object来找到产生野指针的对象。


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Subtype: KERN_INVALID_ADDRESS at 0x0000000000000000
VM Region Info: 0 is not in any region. Bytes before following region: 4301012992
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
UNUSED SPACE AT START
--->
__TEXT 00000001005c4000-0000000103b3c000 [ 53.5M] r-x/r-x SM=COW ...pp/XXXXX

Termination Signal: Segmentation fault: 11
Termination Reason: Namespace SIGNAL, Code 0xb
Terminating Process: exc handler [0]
Triggered by Thread: 0

Thread 0 name: Dispatch queue: com.apple.main-thread
Thread 0 Crashed:
0 libobjc.A.dylib 0x000000018016d6bc lookUpImpOrForward + 92
1 libobjc.A.dylib 0x000000018016d6b4 lookUpImpOrForward + 84
2 libobjc.A.dylib 0x0000000180178758 _objc_msgSend_uncached + 56
3 StoreKit 0x000000019438b980 __NotifyObserverAboutRestoreTransactionsFailure + 48
4 CoreFoundation 0x0000000180de37a0 CFArrayApplyFunction + 80
5 StoreKit 0x000000019438b934 -[SKPaymentQueue _notifyObserversRestoreTransactionsFailedWithError:] + 144
6 StoreKit 0x000000019438a4d4 -[SKPaymentQueue _completeRestoreWithMessage:] + 188
7 StoreKit 0x000000019438b328 __44-[SKPaymentQueue _handleMessage:connection:]_block_invoke + 168
8 libdispatch.dylib 0x000000018089aa54 _dispatch_call_block_and_release + 24
9 libdispatch.dylib 0x000000018089aa14 _dispatch_client_callout + 16
10 libdispatch.dylib 0x00000001808dbc80 _dispatch_main_queue_callback_4CF$VARIANT$armv81 + 968
11 CoreFoundation 0x0000000180ec6544 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 12
12 CoreFoundation 0x0000000180ec4120 __CFRunLoopRun + 2012
13 CoreFoundation 0x0000000180de3e58 CFRunLoopRunSpecific + 436
14 GraphicsServices 0x0000000182c90f84 GSEventRunModal + 100
15 UIKit 0x000000018a46367c UIApplicationMain + 236
16 XXXXX 0x0000000100a302b4 main + 4637364 (main.m:16)
17 libdyld.dylib 0x000000018090056c start + 4

这是一个比较奇怪的Crash,表现上是使用Apple内购IAP的时候触发支付动作点击取消低概率崩溃。从它的崩溃堆栈上看表明是KVO对象已经释放但仍然给它发送消息导致的内存访问崩溃。对于这个Crash的原因推测是在StoreKit(另外一个系统进程)出现观察者对象释放异常,当在非当次访问时就出现Crash。

这里涉及到IAP业务中的编码逻辑,原来的做法在一个单例中全局观察StoreKit的操作类似如下逻辑:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
+  (instancetype)shared {
static dispatch_once_t onceToken;
static id shared;
dispatch_once(&onceToken, ^{
shared = [[self alloc] init];
});
return shared;
}
- (instancetype)init {
self = [super init];
if (self) {
[[SKPaymentQueue defaultQueue] addTransactionObserver:self];
}
return self;
}

也就是这个App程序进程释放前都会一致观察着StoreKit,大概率是这段业务存在异常情况。故在后续版本中修改为支付、恢复业务单次观察单次移除的情况。

解决Crash的一点思路

在解决一个Crash的思路核心总结起来大概就这么两点:

  1. 动态联调与场景复现
  2. 逻辑分析与合理推测

对于一个Crash,假如它的复现路径是明确的,那基本上就是把Xcode开起来动态联调一下相关的业务代码分析一下逻辑就解决的事情了:)。然而我们常常拿到的Crash Report都是线上难以复现的问题,并没有十分明确的复现路径(毕竟在编码、自测与自动化测试阶段就已经规避掉一波了)。

就我目前遇到的Crash Report来说,均为EXC_CRASH和EXC_BAD_ACCESS这两类。对于EXC_BREAKPOINT、EXC_BAD_INSTRUCTION等等类型的Crash还没见识到。故在这里总结一下解决这两类问题的一点点思路和方法。

在解决Crash的路上最重要一点的是要善用现有工具,比如在Xcode联调时我们进行代码调试的时候可以在Code Diagnostics中开启内存、线程调试选项。

对于未捕获异常(EXC_CRASH)

在我们连上Xcode开始调试的时候一定要记得加一个全局的异常断点(Breakpoint navigator —> Create a breakpoint —> Exception Breakpoint),然后就可以到对应的业务场景流程中尝试复现问题路径了。

比如当我们写一段数组访问越界崩溃代码,Xcode就能立马帮我们定位出问题代码的位置。

1
2
NSArray *array = [NSArray new];
id object = [array objectAtIndex:0];

当然了现实中当我们遇到低概率路径崩溃的时候,这种联调找复现路径是一种十分低效的方式。那能怎么办呢?我们可以直接到Crash出现的崩溃上下文附近进行合理的代码逻辑分析与崩溃原因推测,然后再不引入副作用的前提下通过自动化场景复现这类问题。

对于内存问题(EXC_BAD_ACCESS)

这类问题是线上崩溃中最让人头疼的一类问题,原理还是当出现内存访问问题而导致崩溃的时候上报的Crash Report有可能并非出现野指针的真凶。崩溃现场仅有一些参考意义,在大型多人合作的项目中还十分不容易定位到相应的模块。

当然了在ARC环境下Objective-C和Swift高级对象编码中一般而言是不容易存在访问野指针的内存操作(强行非法访问栈内存),反而在直接调用malloc()这种更底层分配堆内存的时候一不小心就有可能造成野指针内存。

所以在开发调试阶段我们就有必要有意识地利用好工具来规避可能存在内存问题,Address Sanitizer就是现阶段Xcode上最有效的工具(比Zombie Objects适用范围更广)。

从文档上看,这个工具可以帮助我们发现以下几种问题:

  • 访问已释放内存(Use of Deallocated Memory)

    1
    2
    3
    4
    5
    6
    7
    __unsafe_unretained MyClass *unsafePointer;
    @autoreleasepool {
    MyClass *object = [MyClass new];
    unsafePointer = object;
    }
    NSLog(@"%d", unsafePointer->instanceVariable);
    // Error: unsafePointer is deallocated in autorelease pool
  • 再次释放已释放的内存(Deallocation of Deallocated Memory)

    1
    2
    3
    int *pointer = malloc(sizeof(int));
    free(pointer);
    free(pointer); // Error: free called twice with the same memory address
  • 释放未分配的内存(Deallocation of Nonallocated Memory)

    1
    2
    int value = 42;
    free(&value); // Error: free called on stack allocated variable
  • 在函数返回后访问函数内的栈上内存(Use of Stack Memory After Function Return)

    1
    2
    3
    4
    5
    6
    int *integer_pointer_returning_function() {
    int value = 42;
    return &value;
    }
    int *integer_pointer = integer_returning_function();
    *integer_pointer = 43; // Error: invalid access of returned stack memory
  • 访问非同一代码块作用域的栈上内存(Use of Out-of-Scope Stack Memory)

    1
    2
    3
    4
    5
    6
    int *pointer = NULL;
    if (bool_returning_function()) {
    int value = integer_returning_function();
    pointer = &value;
    }
    *pointer = 42; // Error: invalid access of stack memory out of declaration scope
  • 缓存区溢出(Overflow and Underflow of Buffers)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    int global_array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
    void foo() {
    int idx = 10;
    global_array[idx] = 42; // Error: out of bounds access of global variable
    char *heap_buffer = malloc(10);
    heap_buffer[idx] = 'x'; // Error: out of bounds access of heap allocated variable
    char stack_buffer[10];
    stack_buffer[idx] = 'x'; // Error: out of bounds access of stack allocated variable
    }
  • C++容器溢出(Overflow of C++ Containers)

    1
    2
    3
    std::vector<int> vector;vector.push_back(0);vector.push_back(1);vector.push_back(2);
    auto *pointer = &vector[0];
    return pointer[3]; // Error: out of bounds access for vector

其中我们最为可能遇到的问题是:访问已释放内存缓存区溢出这两种问题。比如这里造一个缓存区溢出场景,当开启这个功能后能够抓到相应的运行上下文。

额外的,现实中还存在内存泄漏(Memory Leak)和OOM(Out of Memory)这两种内存问题同样会导致程序崩溃且收集不到对于的Crash报告,这两个问题从开发阶段就要借助Xcode Debug Memory Graph或者Instrument Leak模板重点关注。

总的来说就是对于Crash不要担心解决不掉,更需要我们善用到Apple提供的调试工具。从Xcode、Instrument到LLDB总有一款能够帮助你找到崩溃时的一些线索再推断出它出现的原因从而解决它。

参考

WWDC2018-414,Understanding Crashes and Crash Logs
WWDC2017-414,Engineering for Testability
Code Diagnostics
iOS App 后台任务的坑
UIApplication Background Task Notes
滥用单例之dispatch_once死锁
iOS 内存调试技巧
浅谈 Zombie Objects
如何用Xcode8解决多线程问题
【迁移】Xcode7新特性AddressSanitizer