Stuck principle
The time-consuming operation of the main thread will cause lag, which will exceed the threshold and trigger ANR. When the application process starts, Zygote will reflect and call the main method of ActivityThread to start the loop loop. ActivityThread (api29)
public static void main(String[] args) { (); ... (); throw new RuntimeException("Main thread loop unexpectedly exited"); }
Looper's loop method:
// Run the message queue in the thread. Must callpublic static void loop() { for (;;) { // 1. Get the message Message msg = (); // might block ... // This must be in a local variable, in case a UI event sets the logger // 2. Callback before message processing final Printer logging = ; if (logging != null) { (">>>>> Dispatching to " + + " " + + ": " + ); } ... // 3. Message processing begins (msg); ... // 4. Callback after message processing if (logging != null) { ("<<<<< Finished to " + + " " + ); } } }
The for loop exists in the loop, and the main thread can run for a long time. Execute tasks on the main thread, you can post a task to the message queue through Handler, loop to get msg, and hand it over to msg's target (Handler) for processing.
There may be two places where it is stuck:
- Comment 1 ()
- Comment 3 dispatchMessage time-consuming
Time-consuming code (api29)
@UnsupportedAppUsage Message next() { for (;;) { // 1. NextPollTimeoutMillis is not 0 and blocks nativePollOnce(ptr, nextPollTimeoutMillis); // 2. First determine whether the current first message is a synchronization barrier message. if (msg != null && == null) { // 3. When encountering a synchronization barrier message, jump over to get the subsequent asynchronous message to process. The synchronization message is equivalent to a barrier being set up. // Stalled by a barrier. Find the next asynchronous message in the queue. do { prevMsg = msg; msg = ; } while (msg != null && !()); } // 4. Normal message processing, determine whether it is delayed if (msg != null) { if (now < ) { // Next message is not ready. Set a timeout to wake up when it is ready. nextPollTimeoutMillis = (int) ( - now, Integer.MAX_VALUE); } else { // Got a message. mBlocked = false; if (prevMsg != null) { = ; } else { mMessages = ; } = null; if (DEBUG) (TAG, "Returning message: " + msg); (); return msg; } } else { // 5. If no asynchronous message is retrieved, the next time you loop to watch 1, nativePollOnce is -1, which will keep blocking // No more messages. nextPollTimeoutMillis = -1; } } }
- MessageQueue is a linked list data structure, which determines whether the MessageQueue header (first message) is a synchronization barrier message (add a layer of barrier to the synchronization message, so that the synchronization message is not processed, and only asynchronous messages will be processed);
- If you encounter a synchronization barrier message, the synchronization message in MessageQueue will be skipped and only the asynchronous messages inside will be processed. If there is no asynchronous message, then comment 5, nextPollTimeoutMillis is -1, and the next loop call nativePollOnce of comment 1 will block;
- If the looper can get the message normally, regardless of asynchronous/synchronous messages, the processing flow is the same. In Comment 4, determine whether it is delayed. If yes, nextPollTimeoutMillis is assigned, and the next call to the nativePollOnce of Comment 1 will block for a period of time. If it is not a delay message, return msg directly and handle it to the handler.
The next method continuously retrieves messages from MessageQueue, processes messages when there is a message, and calls nativePollOnce to block without messages. The underlying layer is Linux's epoll mechanism and Linux IO multiplexing.
Linux IO multiplexing schemes include select, poll, and epoll. Among them, epoll has the best performance and supports the largest concurrency.
- select: is a system call function provided by the operating system. It can send an array of file descriptors to the operating system. The operating system traverses, determine which descriptor can be read and written, and tell us to process it.
- poll: The main difference between select, removes the limit that select can only listen to 1024 file descriptors.
- epoll: Improvements for the three optimizeable points of select.
1、Keep a collection of file descriptors in the kernel,No need for user re-incoming,Just tell the kernel to modify the part。 2、The kernel no longer finds ready file descriptors through polling,By asynchronousIOEvent wake up。 3、The kernel will only haveIOThe file descriptor of return to the user,Users do not need to traverse the entire file descriptor collection。
Synchronize barrier messages
Android App cannot directly call the synchronous message barrier, MessageQueue (api29) code
@TestApi public int postSyncBarrier() { return postSyncBarrier(()); } private int postSyncBarrier(long when) { ... }
The system's high-priority operation uses synchronization barrier messages, such as the scheduleTraversals method of ViewRootImpl when drawing View, insert synchronization barrier messages, and remove synchronization barrier messages after drawing. ViewRootImpl api29
@UnsupportedAppUsage void scheduleTraversals() { if (!mTraversalScheduled) { mTraversalScheduled = true; mTraversalBarrier = ().getQueue().postSyncBarrier(); ( Choreographer.CALLBACK_TRAVERSAL, mTraversalRunnable, null); if (!mUnbufferedInputDispatch) { scheduleConsumeBatchedInput(); } notifyRendererOfFramePending(); pokeDrawLockIfNeeded(); } } void unscheduleTraversals() { if (mTraversalScheduled) { mTraversalScheduled = false; ().getQueue().removeSyncBarrier(mTraversalBarrier); ( Choreographer.CALLBACK_TRAVERSAL, mTraversalRunnable, null); } }
In order to ensure that the view drawing process is not affected by other tasks in the main thread, the View will insert a synchronization barrier message into the MessageQueue before drawing, and then register the Vsync signal monitoring, and the Choreographer$FrameDisplayEventReceiver monitors the receiving vsync signal callback.
private final class FrameDisplayEventReceiver extends DisplayEventReceiver implements Runnable { @Override public void onVsync(long timestampNanos, long physicalDisplayId, int frame) { Message msg = (mHandler, this); // 1. Send asynchronous messages (true); (msg, timestampNanos / TimeUtils.NANOS_PER_MS); } @Override public void run() { // 2. DoFrame is executed first doFrame(mTimestampNanos, mFrame); } }
Received a Vsync signal callback, comment 1 is an asynchronous message to the main thread MessageQueue post, ensuring that comment 2's doFrame is executed first.
doFrame is where the View really starts drawing. It will call the doTraversal and performTraversals of ViewRootIml, and performTraversals will call the View's onMeasure, onLayout, and onDraw.
Although the app cannot send synchronization barrier messages, using asynchronous messages is allowed.
Asynchronous message The SDK restricts the App from posting asynchronous messages into MessageQueue, Message class
@UnsupportedAppUsage /*package*/ int flags;
Use asynchronous messages with caution. If you use them improperly, the main thread may be faked.
Handler#dispatchMessage
/** * Handle system messages here. */ public void dispatchMessage(@NonNull Message msg) { if ( != null) { handleCallback(msg); } else { if (mCallback != null) { if ((msg)) { return; } } handleMessage(msg); } }
- Handler#post(Runnable r)
- Construct method CallBack
- Handler rewrites handlerMessage method
Application lag is usually caused by the time-consuming process of Handler (the method itself, algorithm efficiency, CPU preemption, insufficient memory, IPC timeout, etc.)
Stop monitoring
Stop monitoring solution 1 Looper#loop
// Run the message queue in the thread. Must callpublic static void loop() { for (;;) { // 1. Get the message Message msg = (); // might block ... // This must be in a local variable, in case a UI event sets the logger // 2. Callback before message processing final Printer logging = ; if (logging != null) { (">>>>> Dispatching to " + + " " + + ": " + ); } ... // 3. Message processing begins (msg); ... // 4. Callback after message processing if (logging != null) { ("<<<<< Finished to " + + " " + ); } } }
Comments 2 and 4 are APIs that provide interfaces that can monitor the time spent by the Handler. Through ().setMessageLogging(printer), the time before and after the message is obtained. After listening to the stutter, dispatchMessage has already ended calling, and the stack does not contain the stutter code.
Get the main thread stack regularly, the time is key, the stack information is value, save the map, and the lag occurs, and it is feasible to remove the stack during the lag time. Suitable for offline use.
- There are string splicing, frequent calls, large number of objects are created, and memory jitters are present.
- The background frequently obtains the main thread stack, which affects performance, obtains the main thread stack, and pauses the main thread's operation.
Stop monitoring solution 2
For online stutter monitoring, bytecode stuttering technology is required.
Through Gradle Plugin+ASM, insert a line of code at the beginning and end positions of each method during the compilation period, which is time-consuming. For example, the lag monitoring solution used by WeChat Matrix. Note the problem:
- Avoid the number of methods to increase by: Assign independent IDs as parameters
- Filter simple functions: add black-eye to reduce non-essential function statistics
WeChat Matrix has made a lot of optimizations, with the package volume increasing by 1% to 2%, the frame rate dropping within 2 frames, and the grayscale package is used.
ANR principle
- Service Timeout: The front-end service has not been executed within 20s, and the back-end service is 10s.
- BroadcastQueue Timeout: The front-end broadcast is completed within 10s, and the back-end is completed within 60s.
- ContentProvider Timeout: publish timeout 10s
- InputDispatching Timeout: The input event is distributed for more than 5 seconds, including key presses and touch events.
ActivityManagerService api29
// How long we allow a receiver to run before giving up on it. static final int BROADCAST_FG_TIMEOUT = 10*1000; static final int BROADCAST_BG_TIMEOUT = 60*1000;
ANR trigger process
Buried bomb
Background service call: --> --> -->
private final void realStartServiceLocked(ServiceRecord r, ProcessRecord app, boolean execInFg) throws RemoteException { // 1. Send delay message (SERVICE_TIMEOUT_MSG) bumpServiceExecutingLocked(r, execInFg, "create"); try { // 2. Notify AMS to create a service (r, , (), ()); } }
Comment 1 internal call scheduleServiceTimeoutLocked
void scheduleServiceTimeoutLocked(ProcessRecord proc) { if (() == 0 || == null) { return; } Message msg = ( ActivityManagerService.SERVICE_TIMEOUT_MSG); = proc; // Send delay message, the front desk service is 20s, and the back desk service is 200s. (msg, ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT); }
Comment 2 Notify AMS before starting the service, Comment 1 sends a handler delay message. If the processing is not completed within 20 seconds (foreground service), ActiveServices#serviceTimeout is called.
Dismantle bomb
To start a Service, it must be managed by AMS first, and then AMS notifies the application of the service life cycle, and the handlerCreateService method of ActivityThread is called.
@UnsupportedAppUsage private void handleCreateService(CreateServiceData data) { try { Application app = (false, mInstrumentation); (context, this, , , app, ()); // 1. Service onCreate call (); (, service); try { // 2. Dismantle the bomb ().serviceDoneExecuting( , SERVICE_DONE_EXECUTING_ANON, 0, 0); } catch (RemoteException e) { throw (); } } }
Comment 1, Service's onCreate method is called Comment 2, AMS's serviceDoneExecuting method will eventually be called
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying, boolean finishing) { //Remove delay message (ActivityManagerService.SERVICE_TIMEOUT_MSG, ); }
After the onCreate is called, the delay message will be removed and the bomb will be removed.
Detonate the bomb. Assuming that the Service's onCreate executes for more than 10 seconds, the bomb will detonate, that is, the ActiveServices#serviceTimeout method will be called. api29
void serviceTimeout(ProcessRecord proc) { if (anrMessage != null) { (null, null, null, null, false, anrMessage); } }
All ANRs, finally, with the appNotResponding method called ProcessRecord. api29
void appNotResponding(String activityShortComponentName, ApplicationInfo aInfo, String parentShortComponentName, WindowProcessController parentProcess, boolean aboveSystem, String annotation) { // 1. Write event log // Log the ANR to the event log. (EventLogTags.AM_ANR, userId, pid, processName, , annotation); // 2. Collect the required logs, anr, CPU, etc. and put them in StringBuilder. // Log the ANR to the main log. StringBuilder info = new StringBuilder(); (0); ("ANR in ").append(processName); if (activityShortComponentName != null) { (" (").append(activityShortComponentName).append(")"); } ("\n"); ("PID: ").append(pid).append("\n"); if (annotation != null) { ("Reason: ").append(annotation).append("\n"); } if (parentShortComponentName != null && (activityShortComponentName)) { ("Parent: ").append(parentShortComponentName).append("\n"); } ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true); // 3. Dump stack information, including java stack and native stack, save to file // For background ANRs, don't pass the ProcessCpuTracker to // avoid spending 1/2 second collecting stats to rank lastPids. File tracesFile = (firstPids, (isSilentAnr()) ? null : processCpuTracker, (isSilentAnr()) ? null : lastPids, nativePids); String cpuInfo = null; // 4. Output ANR log (TAG, ()); if (tracesFile == null) { // 5. If tracesFile is not caught, send a SIGNAL_QUIT signal // There is no trace file, so dump (only) the alleged culprit's threads to the log (pid, Process.SIGNAL_QUIT); } // 6. Output to drawbox ("anr", this, processName, activityShortComponentName, parentShortComponentName, parentPr, annotation, cpuInfo, tracesFile, null); synchronized (mService) { // 7. Backstage ANR, directly kill the process if (isSilentAnr() && !isDebugging()) { kill("bg anr", true); return; } // 8. Error Report // Set the app's notResponding state, and look up the errorReportReceiver makeAppNotRespondingLocked(activityShortComponentName, annotation != null ? "ANR " + annotation : "ANR", ()); // 9. The ANR dialog pops up and the handleShowAnrUi method will be called Message msg = (); = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG; = new (this, aInfo, aboveSystem); (msg); } }
- Write event log
- Write to main log
- Generate tracesFile
- Output ANR logcat (can be seen on the console)
- If tracesFile is not obtained, a SIGNAL_QUIT signal will be sent, triggering the process of collecting thread stack information, and writing to the traceFile
- Output to dropbox
- Backstage ANR, kill the process directly
- Error Report
- Pop up the ANR dialog and call the AppErrors#handleShowAnrUi method.
ANRTrigger process,Buried bomb--》The process of dismantling bombs start upService,onCreateIt will be used before the method is calledHandlerDelay10sNews,ServiceofonCreateThe method is completed,Delayed messages will be removed。 ifServiceofonCreateMethod takes more than10s,Delay消息就会被正常处理,triggerANR,collectcpu、Stack message,bombANR dialog
Crawl the system's data/anr/ file, but higher versions of the system require root permission to read this directory.
ANRWatchDog /SalomonBrys…
Automatic detection of ANR open source library
The above is the detailed content of the Android ANR principle analysis. For more information about the Android ANR principle, please pay attention to my other related articles!