ANR(Application Not responding),是指应用程序未响应,Android系统对于一些事件需要在一定的时间范围内完成,如果超过预定时间能未能得到有效响应或者响应时间过长,都会造成ANR。本文内容包括ANR的类型说明,ANR的原理解析,ANR四种检测方案介绍和常见ANR问题的分析解决方法。
一、ANR的类型
1. InputDispatching Timeout
超时时间:谷歌平台默认5s,MTK平台8s
原因:对输入事件(例如按键或屏幕轻触事件)没有响应
2. Broadcast Timeout
超时时间:前台广播10s,后台广播60s
原因:在特定时间内无法处理完成
3. Service Timeout
超时时间:前台20s,后台60s
原因:小概率类型,Service在特定的时间内无法处理完成
4. ContentProvider Timeout
超时时间:10s
原因:Provider发布(启动)超过10s
二、ANR的原理
InputEvent的ANR与上图有些许不同,是在Native监控,但同样会堵塞主线程的消息队列。
触发ANR的过程可分为两个步骤:
- 埋炸弹
- 拆炸弹或引爆炸弹
Broadcast、ContentProvider和Service三者ANR超时机制类似,下面看下Service的ANR触发原理,Service Timeout是位于ActivityManager线程中的ActivityManagerService.MainHandler收到SERVICE_TIMEOUT_MSG消息时触发。
1. Service Timeout ANR触发机制
1)埋炸弹
Service进程attach到system_server进程的过程中会调用ActiveServices.java中的realStartServiceLocked()方法来埋下炸弹。
/** * 以下精简代码基于Android 6.0 */ private final void realStartServiceLocked(ServiceRecord r, ProcessRecord app, boolean execInFg) throws RemoteException { //发送delay消息(SERVICE_TIMEOUT_MSG),即埋炸弹 bumpServiceExecutingLocked(r, execInFg, "create"); try { //最终执行服务的onCreate()方法 app.thread.scheduleCreateService(r, r.serviceInfo, mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo), app.repProcState); } catch (DeadObjectException e) { } finally { } }
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) { if (r.executeNesting == 0) { if (r.app != null) { r.app.executingServices.add(r); r.app.execServicesFg |= fg; if (r.app.executingServices.size() == 1) { scheduleServiceTimeoutLocked(r.app); } } } else if (r.app != null && fg && !r.app.execServicesFg) { r.app.execServicesFg = true; scheduleServiceTimeoutLocked(r.app); } }
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
long now = SystemClock.uptimeMillis();
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
//当超时后仍没有remove该SERVICE_TIMEOUT_MSG消息,则执行service Timeout流程,即引爆炸弹
mAm.mHandler.sendMessageAtTime(msg,
proc.execServicesFg ? (now+SERVICE_TIMEOUT) : (now+ SERVICE_BACKGROUND_TIMEOUT));
}
2)拆炸弹
在ActivityThread的handleCreateService方法中,即服务创建完成之后拆除炸弹。
private void handleCreateService(ActivityThread.CreateServiceData data) { try { ContextImpl context = ContextImpl.createAppContext(this, packageInfo); context.setOuterContext(service); Application app = packageInfo.makeApplication(false, mInstrumentation); service.attach(context, this, data.info.name, data.token, app, ActivityManagerNative.getDefault()); service.onCreate(); mServices.put(data.token, service); try { //移除超时消息,即拆炸弹,最终调用到ActiveServices中的serviceDoneExecutingLocked方法 ActivityManagerNative.getDefault().serviceDoneExecuting( data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0); } catch (RemoteException e) { // nothing to do. } } catch (Exception e) { } }
/** * ActiveService中的serviceDoneExecutingLocked方法 */ private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying, boolean finishing) { if (r.executeNesting <= 0) { if (r.app != null) { r.app.execServicesFg = false; r.app.executingServices.remove(r); if (r.app.executingServices.size() == 0) { //当前服务所在进程中没有正在执行的service,移除超时消息,即拆炸弹 mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app); } } } }
3)ANR超时炸弹引爆过程
在system_server进程中有一个ActivityManager的Handler线程,当倒计时结束便会向该Handler线程发送一条SERVICE_TIMEOUT_MSG信息,即上面ActiveService类中scheduleServiceTimeoutLocked里面的逻辑调用。
final class MainHandler extends Handler { public void handleMessage(Message msg) { switch (msg.what) { case SERVICE_TIMEOUT_MSG: { mServices.serviceTimeout((ProcessRecord)msg.obj); } break; } } }
上面的mServices调用的serviceTimeout方法就是调用ActiveServices的serviceTimeout方法。
void serviceTimeout(ProcessRecord proc) {
String anrMessage = null;
synchronized(mAm) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
final long now = SystemClock.uptimeMillis();
final long maxTime = now -
(proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
ServiceRecord timeout = null;
long nextTime = 0;
for (int i=proc.executingServices.size()-1; i>=0; i--) {
ServiceRecord sr = proc.executingServices.valueAt(i);
if (sr.executingStart < maxTime) {
timeout = sr;
break;
}
if (sr.executingStart > nextTime) {
nextTime = sr.executingStart;
}
}
if (timeout != null && mAm.mLruProcesses.contains(proc)) {
Slog.w(TAG, "Timeout executing service: " + timeout);
StringWriter sw = new StringWriter();
PrintWriter pw = new FastPrintWriter(sw, false, 1024);
pw.println(timeout);
timeout.dump(pw, " ");
pw.close();
mLastAnrDump = sw.toString();
mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);
anrMessage = "executing service " + timeout.shortName;
} else {
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg
? (nextTime+SERVICE_TIMEOUT) : (nextTime + SERVICE_BACKGROUND_TIMEOUT));
}
}
if (anrMessage != null) {
//当存在timeout的service,则执行appNotResponding
mAm.appNotResponding(proc, null, null, false, anrMessage);
}
}
Broadcast和ContentProvider的ANR机制此处不再赘述。详细可参考:理解Android ANR的触发原理
2. InputDispatching Timeout ANR触发机制
1)埋炸弹
InputReader.cpp读取事件,通过InputDispatcher.cpp分发事件,以下是主要精简代码逻辑,详细见InputDispatcher.cpp。
void InputDispatcher::dispatchOnceInnerLocked(nsecs_t* nextWakeupTime) { nsecs_t currentTime = now(); // Ready to start a new event. // If we don't already have a pending event, go grab one. if (! mPendingEvent) { // Get ready to dispatch the event. // 埋炸弹 resetANRTimeoutsLocked(); } switch (mPendingEvent->type) { case EventEntry::TYPE_KEY: { done = dispatchKeyLocked(currentTime, typedEntry, &dropReason, nextWakeupTime); break; } case EventEntry::TYPE_MOTION: { done = dispatchMotionLocked(currentTime, typedEntry, &dropReason, nextWakeupTime); break; } } // 事件分发处理完成,拆炸弹 if (done) { if (dropReason != DROP_REASON_NOT_DROPPED) { dropInboundEventLocked(mPendingEvent, dropReason); } mLastDropReason = dropReason; releasePendingEventLocked(); *nextWakeupTime = LONG_LONG_MIN; // force next poll to wake up immediately } }
2)dispatchMotionLocked分支
先看下dispatchKeyLocked分发的分支逻辑处理,dispatchKeyLocked之后调用到findFocusedWindowTargetsLocked函数,下面是这个函数的逻辑代码处理:
- 如果当前没有聚焦的窗口,但是有聚焦的应用,则等待应用启动完成,或者启动超时发生ANR。
- 如果窗口处于Pause、连接未注册或连接挂掉等状态则持续等待,直到启动完成或等待超时发生ANR。
int32_t InputDispatcher::findFocusedWindowTargetsLocked(nsecs_t currentTime, const EventEntry* entry, Vector<InputTarget>& inputTargets, nsecs_t* nextWakeupTime) { int32_t injectionResult; String8 reason; // If there is no currently focused window and no focused application // then drop the event. if (mFocusedWindowHandle == NULL) { if (mFocusedApplicationHandle != NULL) { injectionResult = handleTargetsNotReadyLocked(currentTime, entry, mFocusedApplicationHandle, NULL, nextWakeupTime, "Waiting because no window has focus but there is a " "focused application that may eventually add a window " "when it finishes starting up."); goto Unresponsive; } ALOGI("Dropping event because there is no focused window or focused application."); injectionResult = INPUT_EVENT_INJECTION_FAILED; goto Failed; } // Check permissions. if (! checkInjectionPermission(mFocusedWindowHandle, entry->injectionState)) { injectionResult = INPUT_EVENT_INJECTION_PERMISSION_DENIED; goto Failed; } // Check whether the window is ready for more input. reason = checkWindowReadyForMoreInputLocked(currentTime, mFocusedWindowHandle, entry, "focused"); if (!reason.isEmpty()) { injectionResult = handleTargetsNotReadyLocked(currentTime, entry, mFocusedApplicationHandle, mFocusedWindowHandle, nextWakeupTime, reason.string()); goto Unresponsive; } // Success! Output targets. injectionResult = INPUT_EVENT_INJECTION_SUCCEEDED; addWindowTargetLocked(mFocusedWindowHandle, InputTarget::FLAG_FOREGROUND | InputTarget::FLAG_DISPATCH_AS_IS, BitSet32(0), inputTargets); // Done. Failed: Unresponsive: nsecs_t timeSpentWaitingForApplication = getTimeSpentWaitingForApplicationLocked(currentTime); updateDispatchStatisticsLocked(currentTime, entry, injectionResult, timeSpentWaitingForApplication); #if DEBUG_FOCUS ALOGD("findFocusedWindow finished: injectionResult=%d, " "timeSpentWaitingForApplication=%0.1fms", injectionResult, timeSpentWaitingForApplication / 1000000.0); #endif return injectionResult; }
3)dispatchMotionLocked分支
以下是精简的函数代码,详细见InputDispatcher.cpp,走到Unresponsive只有一个地方,检查窗口是否已经加载完毕,没有则等待,等待超时就触发ANR。
int32_t InputDispatcher::findTouchedWindowTargetsLocked(nsecs_t currentTime, const MotionEntry* entry, Vector<InputTarget>& inputTargets, nsecs_t* nextWakeupTime, bool* outConflictingPointerActions) { // Ensure all touched foreground windows are ready for new input. for (size_t i = 0; i < mTempTouchState.windows.size(); i++) { const TouchedWindow& touchedWindow = mTempTouchState.windows[i]; if (touchedWindow.targetFlags & InputTarget::FLAG_FOREGROUND) { // Check whether the window is ready for more input. String8 reason = checkWindowReadyForMoreInputLocked(currentTime, touchedWindow.windowHandle, entry, "touched"); if (!reason.isEmpty()) { injectionResult = handleTargetsNotReadyLocked(currentTime, entry, NULL, touchedWindow.windowHandle, nextWakeupTime, reason.string()); goto Unresponsive; } } } Unresponsive: // Reset temporary touch state to ensure we release unnecessary references to input channels. mTempTouchState.reset(); nsecs_t timeSpentWaitingForApplication = getTimeSpentWaitingForApplicationLocked(currentTime); updateDispatchStatisticsLocked(currentTime, entry, injectionResult, timeSpentWaitingForApplication); return injectionResult; }
三、ANR检测方案
Android应用程序是通过消息来驱动的,Android某种意义上也可以说成是一个以消息驱动的系统,UI、事件、生命周期都和消息处理机制息息相关。Android的ANR监测方案也是一样,大部分就是利用了Android的消息机制。
目前流行的ANR检测方案有开源的BlockCanary 、ANR-WatchDog、SafeLooper,还有根据谷歌原生系统接口监测的方案FileObserver,下面就针对这四种方案根据场景解析对比。
1. BlockCanary
BlockCanary是国内开发者markzhai开发的一款非侵入式的轻量性能监控组件,目前已经把BlockCanary集成在AndroidPerformanceMonitor工程中。
实现原理是巧妙的利用了Android原生Looper.loop中的一个log打印逻辑,在loop函数中分发消息的前后都有调用logging.println()打印日志信息,它在每个message处理的前后被调用,如果主线程卡住了,就是在dispatchMessage里卡住了。
public static void loop() { final Looper me = myLooper(); if (me == null) { throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread."); } final MessageQueue queue = me.mQueue; for (;;) { Message msg = queue.next(); // might block if (msg == null) { // No message indicates that the message queue is quitting. return; } // This must be in a local variable, in case a UI event sets the logger final Printer logging = me.mLogging; if (logging != null) { logging.println(">>>>> Dispatching to " + msg.target + " " + msg.callback + ": " + msg.what); } try { msg.target.dispatchMessage(msg); } catch (Exception exception) { } finally { } if (logging != null) { logging.println("<<<<< Finished to " + msg.target + " " + msg.callback); } } }
可以直接看BlockCanary.java类中的实现,设置了logging对象之后就可以知道每次消息分发的日志输出操作。
/** * Start monitoring. */ public void start() { if (!mMonitorStarted) { mMonitorStarted = true; Looper.getMainLooper().setMessageLogging(mBlockCanaryCore.monitor); } } /** * Stop monitoring. */ public void stop() { if (mMonitorStarted) { mMonitorStarted = false; Looper.getMainLooper().setMessageLogging(null); mBlockCanaryCore.stackSampler.stop(); mBlockCanaryCore.cpuSampler.stop(); } }
之后就是根据日志打印的时间间隔来判断主线程是否阻塞了,详细见LooperMonitor.java。
@Override public void println(String x) { if (mStopWhenDebugging && Debug.isDebuggerConnected()) { return; } if (!mPrintingStarted) { mStartTimestamp = System.currentTimeMillis(); mStartThreadTimestamp = SystemClock.currentThreadTimeMillis(); mPrintingStarted = true; startDump(); } else { final long endTime = System.currentTimeMillis(); mPrintingStarted = false; if (isBlock(endTime)) { notifyBlockEvent(endTime); } stopDump(); } } private boolean isBlock(long endTime) { return endTime - mStartTimestamp > mBlockThresholdMillis; }
优点:
- 灵活配置可监控常见APP应用性能也可作为一部分场景的ANR监测,并且可以准确定位ANR和耗时调用栈。
缺点:
- 谷歌已经明确标注This must be in a local variable, in case a UI event sets the logger。这个looger对象是可以被更改的,已有开发者遇到在使用WebView时,logger被设置为Null导致BlockCanary失效,只能让BlockCanary在WebView初始化之后调用start。
- dispatchMessage执行非常久时无法触发BlockCanary的逻辑。
- 谷歌在Looper中还有一个标注,这里的queue.next可能block,场景就是前面提到的InputEvent,此处block同样会触发ANR,但BlockCanary同样无法适用。
loop函数主要逻辑代码如下:
public static void loop() { final Looper me = myLooper(); if (me == null) { throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread."); } final MessageQueue queue = me.mQueue; for (;;) { Message msg = queue.next(); // might block if (msg == null) { // No message indicates that the message queue is quitting. return; } // This must be in a local variable, in case a UI event sets the logger final Printer logging = me.mLogging; if (logging != null) { logging.println(">>>>> Dispatching to " + msg.target + " " + msg.callback + ": " + msg.what); } try { msg.target.dispatchMessage(msg); } catch (Exception exception) { } finally { } if (logging != null) { logging.println("<<<<< Finished to " + msg.target + " " + msg.callback); } } }
更多资料可参考:BlockCanary — 轻松找出Android App界面卡顿元凶
2. ANR-WatchDog
ANR-WatchDog是参考Android WatchDog机制,起个单独线程向主线程发送一个变量+1操作,自我休眠自定义ANR的阈值,休眠过后判断变量是否+1完成,如果未完成则告警。
对应的主要逻辑代码如下,详细见ANRWatchDog.java。
while (!isInterrupted()) { boolean needPost = _tick == 0; _tick += interval; if (needPost) { _uiHandler.post(_ticker); } try { Thread.sleep(interval); } catch (InterruptedException e) { _interruptionListener.onInterrupted(e); return ; } // If the main thread has not handled _ticker, it is blocked. ANR. if (_tick != 0 && !_reported) { //noinspection ConstantConditions if (!_ignoreDebugger && (Debug.isDebuggerConnected() || Debug.waitingForDebugger())) { Log.w("ANRWatchdog", "An ANR was detected but ignored because the debugger is connected (you can prevent this with setIgnoreDebugger(true))"); _reported = true; continue ; } interval = _anrInterceptor.intercept(_tick); if (interval > 0) { continue; } final ANRError error; if (_namePrefix != null) { error = ANRError.New(_tick, _namePrefix, _logThreadsWithoutStackTrace); } else { error = ANRError.NewMainOnly(_tick); } _anrListener.onAppNotResponding(error); interval = _timeoutInterval; _reported = true; } }
优点:
- 兼容性好,各个机型版本通用
- 无需修改APP逻辑代码,非侵入式
- 逻辑简单,性能影响不大
缺点:
- 无法保证能捕捉所有ANR,对阈值的设置直接影响捕获概率。
如果线程的堵塞大于10s,设置监控阈值5s能捕获所有ANR。堵塞时间在5s~10s,可能出现无法捕获场景。
3. SafeLooper
SafeLooper是个比较新奇的思路,本身就是一个堵塞的消息,在自己内部进行消息的处理,通过反射接管主线程Looper的功能。
主要处理逻辑代码如下,详细见SafeLooper.java。
Method next; Field target; try { Method m = MessageQueue.class.getDeclaredMethod("next"); m.setAccessible(true); next = m; Field f = Message.class.getDeclaredField("target"); f.setAccessible(true); target = f; } catch (Exception e) { return; } RUNNINGS.set(this); MessageQueue queue = Looper.myQueue(); Binder.clearCallingIdentity(); final long ident = Binder.clearCallingIdentity(); while (true) { try { Message msg = (Message) next.invoke(queue); if (msg == null || msg.obj == EXIT){ break; } Handler h = (Handler) target.get(msg); h.dispatchMessage(msg); final long newIdent = Binder.clearCallingIdentity(); if (newIdent != ident) { } msg.recycle(); } catch (Exception e) { Thread.UncaughtExceptionHandler h = uncaughtExceptionHandler; Throwable ex = e; if (e instanceof InvocationTargetException) { ex = ((InvocationTargetException) e).getCause(); if (ex == null) { ex = e; } } // e.printStackTrace(System.err); if (h != null) { h.uncaughtException(Thread.currentThread(), ex); } new Handler().post(this); break; } } RUNNINGS.set(null);
此方案使用反射进行message管理会有很大的性能损耗,但可以自由定制,这种AOP的思想可以借鉴。
4. FileObserver
有ANR的流程就可以知道/data/anr文件夹的变化代表着ANR的发生,AMS在dumpStackTrace方法中给了我们一些提示。
按照这个思路,当ANR发生的时候,可以通过监听ANR Trace文件的写入情况来判断是否发生了ANR,需要注意的是,所有应用发生ANR的时候都会进行回调,需要做一些过滤与判断,如包名、进程号等。
优点:
- 基于原生接口调用,时机和内容准确
- 无性能问题实现简单
缺点:
- 最大的困难是兼容性问题,这个方案受限于Android系统的SELinux机制,5.0以后基本已经使低权限应用无法监听到Trace文件,但是可以在开发内测阶段通过root手机修改app对应的te文件提权进行监控。
目前能了解到的方案并不太多,在Goolge Play上有2.68%实用率的ACRA库也只是推荐了WatchDog方式。建议FileObserver和WatchDog组合使用,能覆盖绝大部分的机型和ANR异常。
四、ANR问题分析解决
对于ANR问题,先根据ANR系统日志信息确认准确的时间点,接着看日志信息里面CPU和IO是否偏高,之后根据ANR时间点找对应的Trace文件,分析ANR堆栈信息,通过应用包名过滤ANR应用相关的调用逻辑。
ANR常见类型如下:
1. 主线程耗时操作
比如网络访问、访问数据库、文件读写、频繁的大量的计算赋值逻辑,这些都是常见的ANR原因,具体根据ANR的Trace文件调用堆栈信息可以直接看出来,这边不再赘述。
以上的ANR一般使用异步的方式解决,当然不是简单的new一个线程,最好根据业务场景以及频率来决定,Android常用的异步操作有AsyncTask,IntentService,线程池(官方四种或自定义),最好用一个线程池管理操作线程,不建议每次直接new一个线程。相关阅读见Android 子线程更新UI详解。
2. CPU过高
比如下面这份ANR日志信息,CPU数值异常偏高到186%,导致CPU过高的原因可能是频繁地IO读写,频繁调用耗时JNI接口。
ActivityManager: ANR in com.autonavi.amapauto (com.autonavi.amapauto/.MainMapActivity) ActivityManager: PID: 7321 ActivityManager: Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 5. Wait queue head age: 5765.8ms.) ActivityManager: Load: 16.72 / 11.58 / 8.91 ActivityManager: CPU usage from 0ms to 13484ms later: ActivityManager: 186% 7321/com.autonavi.amapauto: 105% user + 80% kernel / faults: 2474 minor 3 major ActivityManager: 49% 620/system_server: 22% user + 26% kernel / faults: 2787 minor ActivityManager: 30% 1172/com.tencent.wecarspeech: 23% user + 7.2% kernel / faults: 3955 minor 2 major ActivityManager: 11% 1536/com.android.bluetooth: 7.1% user + 4.2% kernel / faults: 2405 minor 2 major ActivityManager: 4.7% 215/logd: 4.2% user + 0.4% kernel / faults: 13 minor ActivityManager: 4% 239/debuggerd: 0.6% user + 3.4% kernel / faults: 4584 minor 1 major ActivityManager: 6.8% 1296/com.android.phone: 4.1% user + 2.6% kernel / faults: 1230 minor ActivityManager: 6.4% 925/com.nforetek.bt: 2.6% user + 3.7% kernel / faults: 1097 minor 1 major ActivityManager: 5.8% 7537/com.tencent.wecarnavi:wecarbase: 2.9% user + 2.8% kernel / faults: 1150 minor ActivityManager: 5.7% 7509/com.tencent.wecarnavi: 3% user + 2.6% kernel / faults: 951 minor ActivityManager: 4.6% 866/sdcard: 0% user + 4.5% kernel / faults: 1 minor ActivityManager: 4.3% 242/mediaserver: 3.2% user + 1.1% kernel ActivityManager: 3.7% 858/com.android.launcher: 2.7% user + 0.9% kernel / faults: 741 minor 1 major ActivityManager: 3.3% 792/gaei.cluster.service: 1.9% user + 1.4% kernel / faults: 908 minor ActivityManager: 2.8% 1125/com.tencent.wecarnews: 1.4% user + 1.4% kernel / faults: 477 minor ActivityManager: 2.7% 881/com.android.systemui: 1.5% user + 1.1% kernel / faults: 963 minor 2 major ActivityManager: 2.6% 8825/top: 0.8% user + 1.8% kernel ActivityManager: 2.3% 798/com.gaei.bt: 0.9% user + 1.4% kernel / faults: 846 minor ActivityManager: 2.1% 1101/com.tencent.wecarmusicp: 0.7% user + 1.4% kernel / faults: 42 minor ActivityManager: 2.1% 7744/com.autonavi.amapauto:push: 1.1% user + 1% kernel / faults: 30916 minor ActivityManager: 1.1% 1269/com.gaei.settings: 0.5% user + 0.5% kernel / faults: 955 minor ActivityManager: 1.5% 110/mmcqd/3: 0% user + 1.5% kernel ActivityManager: 0.8% 833/gaei.reverse: 0.4% user + 0.3% kernel / faults: 826 minor ActivityManager: 0.7% 5949/logcat: 0.4% user + 0.3% kernel ActivityManager: 1.2% 803/gaei.thirdparty.media.adapter: 0.7% user + 0.5% kernel / faults: 654 minor ActivityManager: 0.6% 998/gaei.cluster: 0.3% user + 0.3% kernel / faults: 759 minor ActivityManager: 0.6% 1279/com.gaei.gaeihvsmsettings: 0.3% user + 0.3% kernel / faults: 868 minor ActivityManager: 1.1% 233/surfaceflinger: 0.5% user + 0.5% kernel ActivityManager: 0.5% 838/gaei.lockscreen: 0.3% user + 0.2% kernel / faults: 750 minor ActivityManager: 0.5% 964/gaei.ecallbcall: 0.2% user + 0.2% kernel / faults: 719 minor ActivityManager: 0.5% 1256/com.thundersoft.update: 0.3% user + 0.1% kernel / faults: 745 minor ActivityManager: 0.5% 1274/gaei.bluetooth: 0.2% user + 0.2% kernel / faults: 732 minor ActivityManager: 0.5% 1285/com.gaei.vehichesetting: 0.3% user + 0.1% kernel / faults: 786 minor ActivityManager: 0.5% 8916/kworker/0:2: 0% user + 0.5% kernel ActivityManager: 0.5% 2439/cn.gaei.appstore: 0.4% user + 0% kernel / faults: 60 minor ActivityManager: 0.3% 232/servicemanager: 0.1% user + 0.2% kernel ActivityManager: 0.2% 252/rild: 0% user + 0.2% kernel ActivityManager: 0.2% 2131/com.thundersoft.connectivity: 0.2% user + 0% kernel / faults: 133 minor ActivityManager: 0.2% 2153/com.excelfore.hmiagent: 0% user + 0.2% kernel / faults: 1779 minor ActivityManager: 0.1% 7383/com.autonavi.amapauto:locationservice: 0.1% user + 0% kernel / faults: 266 minor ActivityManager: 0.2% 7/rcu_preempt: 0% user + 0.2% kernel ActivityManager: 0% 7360/com.autonavi.amapauto:adiu: 0% user + 0% kernel / faults: 223 minor ActivityManager: 0.1% 8892/kworker/u8:0: 0% user + 0.1% kernel ActivityManager: 0% 1//init: 0% user + 0% kernel ActivityManager: 0% 3/ksoftirqd/0: 0% user + 0% kernel ActivityManager: 0% 107/debounce_task: 0% user + 0% kernel ActivityManager: 0% 108/irq/43-mm-irq-t: 0% user + 0% kernel ActivityManager: 0% 210/jbd2/mmcblk3p4-: 0% user + 0% kernel ActivityManager: 0% 237/netd: 0% user + 0% kernel / faults: 15 minor ActivityManager: 0% 244/smcd: 0% user + 0% kernel ActivityManager: 0% 248/system_setting_service: 0% user + 0% kernel ActivityManager: 0% 1087/com.trumpchi.assistant.app: 0% user + 0% kernel / faults: 2 minor ActivityManager: 0% 2009/com.thunderst.update: 0% user + 0% kernel / faults: 17 minor ActivityManager: 93% TOTAL: 51% user + 41% kernel + 0.1% iowait + 0.3% softirq ActivityManager: CPU usage from 11778ms to 12377ms later: ActivityManager: 46% 620/system_server: 23% user + 23% kernel ActivityManager: rename tracefile/data/anr/traces.txt to /data/anr/traces_com.autonavi.amapauto_20190906_110548.txt
在CPU异常偏高的情况下,系统记录的ANR Trace文件里面不会有ANR应用的调用逻辑堆栈信息,比如只有下面的“main”信息。
"main" prio=5 tid=1 Native | group="main" sCount=1 dsCount=0 obj=0x74030258 self=0xb4df6500 | sysTid=1336 nice=0 cgrp=default sched=0/0 handle=0xb6f90b34 | state=S schedstat=( 14858997090 10346952287 41860 ) utm=903 stm=582 core=1 HZ=100 | stack=0xbe7b2000-0xbe7b4000 stackSize=8MB | held mutexes= kernel: (couldn't read /proc/self/task/1336/stack) native: #00 pc 00040d04 /system/lib/libc.so (__epoll_pwait+20) native: #01 pc 0001a17f /system/lib/libc.so (epoll_pwait+26) native: #02 pc 0001a18d /system/lib/libc.so (epoll_wait+6) native: #03 pc 00012cfb /system/lib/libutils.so (android::Looper::pollInner(int)+102) native: #04 pc 00012f77 /system/lib/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+130) native: #05 pc 000807ad /system/lib/libandroid_runtime.so (android::NativeMessageQueue::pollOnce(_JNIEnv*, _jobject*, int)+22) native: #06 pc 00008ebd /data/dalvik-cache/arm/system@framework@boot.oat (Java_android_os_MessageQueue_nativePollOnce__JI+96) at android.os.MessageQueue.nativePollOnce(Native method) at android.os.MessageQueue.next(MessageQueue.java:323) at android.os.Looper.loop(Looper.java:135) at android.app.ActivityThread.main(ActivityThread.java:5487) at java.lang.reflect.Method.invoke!(Native method) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:726) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)
这个时候应该根据ANR的时间点分析应用的日志文件,往前10s看应用调用逻辑信息,分析下哪部分逻辑模块的日志输出过于频繁。但有一个例外情况,如果问题出在应用日志输出写文件过于频繁,则需要精简下日志,去除不必要的日志信息。过于频繁的日志输出占用IO带宽,竞争CPU资源,特别是在设备IO带宽偏低的情况下,很容易影响到。
如果是耗时JNI接口频繁调用导致的ANR,可根据场景规避减少JNI接口的调用,遇到过性能差的设备即使是在JNI层调用setInt赋值一个参数都会耗时50ms的情况。
3. 卡在IO读写
一般是文件操作导致,比如下面的日志信息:
ANRManager: 100% TOTAL: 2% user + 2.1% kernel + 95% iowait + 0.1% softirq
iowait占比95%,分析ANR Trace文件,通过应用包名过滤调用堆栈信息,或者在ANR时间点往前看10s应用日志,看看当时做什么文件操作,一般也是用异步的方式来解决这个问题。
4. 死锁或锁等待
对于这种问题,一般会尝试将锁改为超时锁,比如lock的trylock,超时会自动释放锁,避免一直持有锁的情况发生。
5. 主线程Binder调用等待超时
主线程执行了Binder请求,对端迟迟未返回很容易出现这个问题,一般使用异步的方法解决。
6. Binder线程池被占满
系统对每个进程最多分配15个Binder线程,如果另一个进程发送太多重复Binder请求,那么就会导致接收端Binder线程被占满,从而处理不了其它的Binder请求。
判断Binder是否用完,可以在trace中搜索关键字”binder_f”,如果搜索到则表示已经用完,接着分析日志看是谁一直在消耗Binder或者是有死锁发生。
解决的方法就是降低极短时间内大量Binder请求,比如在发送BInder请求的函数中做时间差过滤,限定在500ms内最多执行一次。
7. JE或者NE导致ANR
ANR前出现频繁NE,NE所在的进程与ANR的进程有交互,在解决了NE后,ANR也不复存在。
对于这类在ANR前有JE或者NE,先解决JE或NE,JE/NE发生时会去dump一大堆异常信息,本身也会加重CPU负载,修改完异常后再来看ANR是否还存在。如果还存在,那么就看Trace 堆栈。如果不存在,则可以基本判定是JE或NE导致。
五、参考资料
BlockCanary — 轻松找出Android App界面卡顿元凶
扩展阅读:
Android Home键之后后台启动Activity延迟5秒
转载请注明出处:陈文管的博客 – Android ANR详解
扫码或搜索:文呓
微信公众号 扫一扫关注