Android ANR non-response analysis solution

1.What is ANR

ANR:Application Not Responding, that is, the application is unresponsive

Types and solutions to avoid each type

There are generally three types of ANR:

1) KeyDispatchTimeout(5 seconds) KeyDispatchTimeout(5 seconds) No response in a specific time

2) BroadcastTimeout(10 seconds) BroadcastReceiver cannot complete processing within a specific time

3) ServiceTimeout(20 seconds or 200 seconds) Front-end service: the timeout time is 20s; the timeout time is 200s; the service cannot complete the process within a specific time

2.1 KeyDispatchTimeout

Akey or touch event was not dispatched within the specified time

The specific timeout time is defined under framework

//How long we wait until we timeout on key dispatching.
staticfinal int KEY_DISPATCHING_TIMEOUT = 5*1000

Why does the timeout exceed?The count of timeout time generally starts from the distribution of keys to the app. There are generally two reasons for timeout: 1) The current event has no chance to be processed (that is, the UI thread is processing the previous event and has not completed it in time or the looper is blocked by some reason) 2) The current event is being processed, but has not completed it in time.

How to avoid KeyDispatchTimeout?1) UI threads try to do only UI-related work 2) Time-consuming work (such as database operations, I/O, network connection or other operations that may hinder UI threads) Put it into a separate thread for processing 3) Try to use Handler to handle the interaction between UIthread and other threads

As for Service and BroadcastReceiver, the same as the above analysis, I won’t say much here.

3. How to analyze ANR logs?

Let's see a LOG first:

04-01 13:12:11.572  I/InputDispatcher( 220): Application is not responding :Window{/=false}.  5009.8ms since event, 5009.5ms since waitstarted
04-0113:12:11.572 I/WindowManager( 220): Input event dispatching timedout sending /
04-01  13:12:14.123 I/Process(  220): Sending signal. PID: 21404 SIG: 3--- occurANR Time and generation  Time
04-01 13:12:14.123 I/dalvikvm(21404):threadid=4: reacting to signal 3 
……
04-0113:12:15.872 E/ActivityManager(  220): ANR in (/.)
04-0113:12:15.872 E/ActivityManager(  220): Reason:keyDispatchingTimedOut
04-0113:12:15.872 E/ActivityManager(  220): Load: 8.68 / 8.37 / 8.53
04-0113:12:15.872 E/ActivityManager(  220):  CPUusage from 4361ms to 699ms ago   ---- CPU exist ANR occur前的使用情况
04-0113:12:15.872 E/ActivityManager(  220):   5.5%21404/: 1.3% user + 4.1% kernel / faults: 10 minor
04-0113:12:15.872 E/ActivityManager(  220):   4.3%220/system_server: 2.7% user + 1.5% kernel / faults: 11 minor 2 major
04-0113:12:15.872 E/ActivityManager(  220):   0.9%52/spi_qsd.0: 0% user + 0.9% kernel
04-0113:12:15.872 E/ActivityManager(  220):   0.5%65/irq/170-cyttsp-: 0% user + 0.5% kernel
04-0113:12:15.872 E/ActivityManager(  220):   0.5%296/: 0.5% user + 0% kernel
04-0113:12:15.872 E/ActivityManager(  220):  100%TOTAL: 4.8% user + 7.6% kernel + 87% iowait
04-0113:12:15.872 E/ActivityManager(  220):  CPUusage from 3697ms to 4223ms later :-- ANR back CPU The amount of use
04-0113:12:15.872 E/ActivityManager(  220):   25%21404/: 25% user + 0% kernel / faults: 191 minor
04-0113:12:15.872 E/ActivityManager(  220):    16% 21603/__eas(: 16% user + 0% kernel
04-0113:12:15.872 E/ActivityManager(  220):    7.2% 21406/GC: 7.2% user + 0% kernel
04-0113:12:15.872 E/ActivityManager(  220):    1.8% 21409/Compiler: 1.8% user + 0% kernel
04-0113:12:15.872 E/ActivityManager(  220):   5.5%220/system_server: 0% user + 5.5% kernel / faults: 1 minor
04-0113:12:15.872 E/ActivityManager(  220):    5.5% 263/InputDispatcher: 0% user + 5.5% kernel
04-0113:12:15.872 E/ActivityManager(  220):  32%TOTAL: 28% user + 3.7% kernel

From the ANR log, we seeANR in Keywords, and then next we seeReason:keyDispatchingTimedOut, is the timeout in the event response, but the specific location cannot be seen here, but you can see a message here: the reason for ANR is that the CPU occupancy rate is high, the task cannot be executed in time slices, or is it because IO is dense, which leads to ANR. This is very important, and we can provide direction for analyzing the specific trace logs in the future.

In addition to looking at LOG, you also need files to solve ANR. How to get them? You can use the following command to obtain

$chmod 777 /data/anr
$rm /data/anr/
$ps
$kill -3  PID
adb pull data/anr/ ./

From the file, the most information is as follows:

-----pid 21404 at 2011-04-01  13:12:14  -----   
Cmdline:  
DALVIK THREADS: 
(mutexes: tll=0tsl=0 tscl=0 ghl=0 hwl=0 hwll=0) 
"main" prio=5 tid=1 NATIVE 
  | group="main" sCount=1 dsCount=0obj=0x2aad2248 self=0xcf70 
  | sysTid=21404 nice=0 sched=0/0cgrp=[fopen-error:2] handle=1876218976 
   (Native Method)   (:119)   (:110 ) 
 at (:3688) 
 at (Native Method) 
  (:507) 
  $(:866)
 at (:624) 
 at (Native Method)

From here we see that the main thread is stucknativePollOnceHere, from the previous analysis of handler and MQ source codes, we can see that this message queue is empty, waiting for the next message to join the queue, and the main thread wakes up, which means that the main thread is dead, waiting for the timeout.

Summary: How to investigate and resolve ANR?

1) First analyze the log and look at the general reason. It is best to directly locate it, but generally it cannot be directly locate it here.

2) Call stack from file view.

3) Compare each thread log and look at your own code

4) During the process, tightly intersperse a main line and carefully check the causes of ANR (iowait?block?memoryleak?)

4. Case analysis

4.1 IO wait for example

Process: 
Activity:/. Subject:
keyDispatchingTimedOut CPU usage from 2550ms to -2814ms ago: 5%187
/system_server: 3.5% user + 1.4% kernel 
/ faults: 86 minor 20major 4.4% 1134/: 0.7% user + 3.7% kernel 
/faults: 38 minor 19 major 4% 372/: 0.7%user + 3.3% kernel 
/ faults: 6 minor 1.1% 272/:0.9% user + 0.1% kernel 
/ faults: 33 minor 0.9%252/: 0.9% user + 0% kernel 0%409/: 0% user + 0% kernel 
/faults: 2 minor 0.1% 632/: 0.1% user + 0%kernel 
100%TOTAL: 6.9% user + 8.2% kernel + 84%iowait 
-----pid 1134 at 2010-12-17 17:46:51 -----
 Cmd line: 
DALVIK THREADS: (mutexes: tll=0 tsl=0tscl=0 ghl=0 hwl=0 hwll=0) 
"main" prio=5 tid=1 WAIT |group="main" sCount=1 dsCount=0 obj=0x2aaca180self=0xcf20 | sysTid=1134 nice=0 sched=0/0 cgrp=[fopen-error:2]handle=1876218976 at (Native Method) -waiting on &lt;0x2aaca218&gt; (a ) 
(:1424) 
(:48) 
(:337) 
(:157) 
(:808) 
(:841) 
(:1171) 
$(:200) 
(:261) 
(:378) 
.&lt;init&gt;(:222) 
(:53) 
(:1356) 
(:1235) 
(:1189) 
(:1271) 
(:1098) 
$(:187) 
.  (:268) 
$(:648) 
(:658) 
(:700) 
$2500(:98) 
$LoadBodyTask .onPostExecute(:1290) 
$(:1255) (:417) 
$300(:127) 
at  $ (:429) 
(:99) 
(:123) 
(:3652) 
(Native Method) 
(:507)

We see keywords from the log84%iowait, and for CPU, this log occupies very little, which shows that the reason is probably because of IO-intensive. From the following logs, there is a View, ContentProvider, Sqlite, and locks. Let's first guess whether it is caused by database operations in the main thread?

Looking at the log specifically, there should be places in our code where ContentResolver is called inside the view. Search for it quickly. Whether there is such a call in our code. Sure enough, after searching, there is indeed:

         final Message message = Message . restoreMessageWithId (mProviderContext , messageId );
         if ( message == null ) {
            return ;
         }
         Account account = Account . restoreAccountWithId ( mProviderContext ,message . mAccountKey );
         if ( account == null ) {
            return ; //isMessagingController returns false for null, but let's make itclear.
         }
         if ( isMessagingController ( account )) {
            new Thread () {
                @Override
                public void run () {
                   mLegacyController . processPendingActions ( message .mAccountKey );
                }
            }. start ();
         }

Have you found the problem? This is called in the main threadAccount . restoreAccountWithId ( mProviderContext ,message . mAccountKey );Query the data. If the normal system resources are not tight, this call will not have too many problems. However, if the data is large or the system is currently busy with IO, will the code be executed very slowly, causing the main thread event processing to time out? So simply change the code to the following, there is no problem with running.

new Thread() {
         final Message message = Message . restoreMessageWithId (mProviderContext , messageId );
         if ( message == null ) {
            return ;
         }
         Account account = Account . restoreAccountWithId ( mProviderContext ,message . mAccountKey );
         if ( account == null ) {
            return ; //isMessagingController returns false for null, but let's make itclear.
         }
         if(isMessagingController(account)) {
                  ();
            }
}.start();

4.2Memoryleak/Thread leak

11-1621:41:42.560 I/ActivityManager( 1190): 
ANR in process: (last in ) 11-1621:41:42.560 
I/ActivityManager( 1190): Annotation:keyDispatchingTimedOut 11-16 21:41:42.560 I/ActivityManager(1190): CPU usage: 11-16 21:41:42.560 
I/ActivityManager( 1190):Load: 11.5 / 11.1 / 11.09 11-16 21:41:42.560 
I/ActivityManager(1190): CPU usage from 9046ms to 4018ms ago: 11-16 21:41:42.560
I/ActivityManager( 1190): :98% = 97% user + 0% kernel / faults: 1134 minor 11-16 21:41:42.560
I/ActivityManager( 1190): system_server: 0% = 0% user + 0% kernel /faults: 1 minor 11-16 21:41:42.560 
I/ActivityManager( 1190): adbd:0% = 0% user + 0% kernel 11-16 21:41:42.560 
I/ActivityManager(1190): logcat: 0% = 0% user + 0% kernel 11-16 
21:41:42.560
I/ActivityManager( 1190):  TOTAL:100% = 98% user + 1% kernel
Cmdline:  
DALVIK THREADS: "main"prio=5 tid=3  VMWAIT |group="main" sCount=1 dsCount=0 s=N obj=0x40026240self=0xbda8 | sysTid=1815 nice=0 sched=0/0 cgrp=unknownhandle=-1344001376 
 (NativeMethod )
(Native Method) 
 (:468) 
(:6324) 
(:6178) 
(:1541)
$(:1830) (:1349) 
(:1114) 
(:1633) 
(:99) 
(:123) 
(:4370) 
(Native Method) 
(:521) 
$(:868) 
(:626) 
(Native Method)
"Thread-408"prio=5 tid=329 WAIT |group="main" sCount=1 dsCount=0 s=N obj=0x46910d40self=0xcd0548 | sysTid=10602 nice=0 sched=0/0 cgrp=unknownhandle=15470792 
at (Native Method) -waiting on <0x468cd420> (a ) 
(:288) 
$UiUpdaterExecutor$(:289) 
(:1096)

We saw that in the log, ANR occurs in VMRuntime, with keywords Bitmap, createBitmap, nativeCreate, ViewRoot, ActivityThread#main. Let's boldly guess whether it was loading large images in the main thread view drawing function, and applying for bitmap not to be released. Looking at the log carefully, we found that it was because of the result that the log was not released.at . (NativeMethod ) , when bitmap is not enough when applying for memory, it is blocked at this time.

The solution is very simple. At this time, based on the subsequent threads, processes, and stack details, we will reverse guess and find the relevant code to see if there is a possible memory leak.

The above is the detailed content of the Android ANR unresponsive analysis solution. For more information about Android ANR unresponsiveness, please follow my other related articles!