Android teaches you how to discover the implementation of APP lag

Recently, the department plans to optimize the lag situation of APP on low-end machines. Since you want to optimize, you must obtain the lag situation. Then how to obtain the lag situation is the purpose of this article.

Generally, too much UI drawing, a large number of IO operations or a large number of computing operations take up the CPU, causing the App interface to be stuck. As long as we can capture the stack information of the main thread and the system's resource usage information when the lag occurs, we can accurately analyze what function the lag occurs and how the resource usage is. So the question is how to effectively detect the lag of the Android main thread?

Use the adb system tool to observe the app's stuttering data and try to redirect the scene to locate the problem.

The commonly used method is to use the adb SurfaceFlinger service and the adb gfxinfo function. In the process of automating the operation of the app, use adb to obtain data to monitor the smoothness of the app, find the time period of lag, and find the scenes and operations that lag.

Method 1: adb shell dumpsysSurfaceFlinger

Use the ‘adb shell dumpsysSurfaceFlinger’ command to obtain the data of the last 127 frames. The frame rate FPS is calculated by executing the adb command regularly to obtain the number of frames.

Method 2: adb shell dumpsys gfxinfo

Use the ‘adb shell dumpsys gfxinfo’ command to obtain the latest 128 frames of drawing information, including the time spent in each frame of Draw, Process, and Execute. If the sum of these three times exceeds 16.6ms, it is considered that a lag has occurred.

Two existing solutions are more suitable for measuring the repair effect of regression lag problems and determining whether there are lags in certain specific scenarios. However, this method has several obvious shortcomings:

It is generally difficult to construct an environment where actual user lags to reproduce;
This method is more troublesome to operate, requires writing automated use cases, and cannot cover a large number of suspicious scenarios, and testing and reproduction is time-consuming and labor-intensive;
Unable to measure the stuttering of static pages;
When there is a stutter, the app cannot obtain the operating status and information in time, and it is difficult to develop and position.

With in-depth research on Android source code, there are also two other convenient methods, and these two methods are less invasive and have low memory occupancy, which can be better used in actual scenarios:

Log matching using Looper printed by UI threads;
use

Log matching with Looper printed by UI threads

Android main thread updates UI. If the interface is refreshed less than 60 times in 1 second, that is, the FPS is less than 60, the user will feel stuck. Simply put, Android uses the message mechanism to update the UI. The UI thread has a Looper. In its loop method, the message will be continuously taken out and the bound Handler will be called to execute on the UI thread. If there is a time-consuming operation in the handler dispatchMesaage method, a lag will occur.

Let's take a look at the source code of ( )

public static void loop() {
    final Looper me = myLooper();
    if (me == null) {
      throw new RuntimeException("No Looper; () wasn't called on this thread.");
    }
    final MessageQueue queue = ;

    // Make sure the identity of this thread is that of the local process,
    // and keep track of what that identity token actually is.
    ();
    final long ident = ();

    // Allow overriding a threshold with a system prop. .
    // adb shell 'setprop . 1 && stop && start'
    final int thresholdOverride =
        ("."
            + () + "."
            + ().getName()
            + ".slow", 0);

    boolean slowDeliveryDetected = false;

    for (;;) {
      Message msg = (); // might block
      if (msg == null) {
        // No message indicates that the message queue is quitting.
        return;
      }

      // This must be in a local variable, in case a UI event sets the logger
      final Printer logging = ;
      if (logging != null) {
        (">>>>> Dispatching to " +  + " " +
             + ": " + );
      }
      // Make sure the observer won't change while processing a transaction.
      final Observer observer = sObserver;

      final long traceTag = ;
      long slowDispatchThresholdMs = ;
      long slowDeliveryThresholdMs = ;
      if (thresholdOverride > 0) {
        slowDispatchThresholdMs = thresholdOverride;
        slowDeliveryThresholdMs = thresholdOverride;
      }
      final boolean logSlowDelivery = (slowDeliveryThresholdMs > 0) && ( > 0);
      final boolean logSlowDispatch = (slowDispatchThresholdMs > 0);

      final boolean needStartTime = logSlowDelivery || logSlowDispatch;
      final boolean needEndTime = logSlowDispatch;

      if (traceTag != 0 && (traceTag)) {
        (traceTag, (msg));
      }

      final long dispatchStart = needStartTime ? () : 0;
      final long dispatchEnd;
      Object token = null;
      if (observer != null) {
        token = ();
      }
      long origWorkSource = ();
      try {
        (msg);
        if (observer != null) {
          (token, msg);
        }
        dispatchEnd = needEndTime ? () : 0;
      } catch (Exception exception) {
        if (observer != null) {
          (token, msg, exception);
        }
        throw exception;
      } finally {
        (origWorkSource);
        if (traceTag != 0) {
          (traceTag);
        }
      }
      if (logSlowDelivery) {
        if (slowDeliveryDetected) {
          if ((dispatchStart - ) <= 10) {
            (TAG, "Drained");
            slowDeliveryDetected = false;
          }
        } else {
          if (showSlowLog(slowDeliveryThresholdMs, , dispatchStart, "delivery",
              msg)) {
            // Once we write a slow delivery log, suppress until the queue drains.
            slowDeliveryDetected = true;
          }
        }
      }
      if (logSlowDispatch) {
        showSlowLog(slowDispatchThresholdMs, dispatchStart, dispatchEnd, "dispatch", msg);
      }

      if (logging != null) {
        ("<<<<< Finished to " +  + " " + );
      }

      // Make sure that during the course of dispatching the
      // identity of the thread wasn't corrupted.
      final long newIdent = ();
      if (ident != newIdent) {
        (TAG, "Thread identity changed from 0x"
            + (ident) + " to 0x"
            + (newIdent) + " while dispatching to "
            + ().getName() + " "
            +  + " what=" + );
      }

      ();
    }
  }

The two red places in the code are the logs printed before and after execution of (msg). By measuring the processing time, it can be detected whether some UI threads have time-consuming operations. Note that there are two functions before and after the code execution line. If logging is set, the logs such as ">>>>>> Dispatching to " and "<<< Finished to " will be printed out respectively. In this way, we can calculate the execution time of dispatchMessage through the time difference between the two logs, so as to set the threshold to determine whether there has been a stutter.

So how to set logging?

Let's look at the following code:

/**
   * Control logging of messages as they are processed by this Looper. If
   * enabled, a log message will be written to <var>printer</var>
   * at the beginning and ending of each message dispatch, identifying the
   * target Handler and message contents.
   *
   * @param printer A Printer object that will receive log messages, or
   * null to disable message logging.
   */
public final class Looper { 
  private Printer mLogging; 
  public void setMessageLogging(@Nullable Printer printer) { 
    mLogging = printer; 
  } 
} 

public interface Printer { 
  void println(String x); 
}

Looper's mLogging is private and provides the setMessageLogging(@Nullable Printer printer) method, so we can implement a Printer ourselves, and pass it in through the setMessageLogging() method. The code is as follows:

public class BlockDetectByPrinter {
  
  public static void start() {
    ().setMessageLogging(new Printer() {
      private static final String START = ">>>>> Dispatching";
      private static final String END = "<<<<< Finished";

      @Override
      public void println(String x) {
        if ((START)) {
          ().startMonitor();
        }
        if ((END)) {
          ().removeMonitor();
        }
      }
    });
  }
}

After logging is set, the loop method will callback print out the time log of each message execution: ">>>>>> Dispatching to " and "<<< Finished to ". The use of BlockDetectByPrinter is called () in the onCreate method of Application.

We can simply implement a LogMonitor to record the stack information of the main thread when it is stuck. When matching >>>>>> Dispatching, executing startMonitor will execute the task after 200ms (set stutter threshold). This task is responsible for printing the stack information of the UI thread in the child thread (non-UI thread). If the message is completed within 200ms, the <<<<< Finished log can be matched. If the removalMonitor is executed before the printing stack task is started, there is no lag. If the message is completed after more than 200ms, the lag is considered to have occurred and the stack information of the UI thread is printed.

How to implement LogMonitor?

public class LogMonitor {
  private static final String TAG = "LogMonitor";
  private static LogMonitor sInstance = new LogMonitor();
  private HandlerThread mLogThread = new HandlerThread("log");
  private Handler mIoHandler;
  private static final long TIME_BLOCK = 200L;

  private LogMonitor() {
    ();
    mIoHandler = new Handler(());
  }

  private static Runnable mLogRunnable = new Runnable() {
    @Override
    public void run() {
      StringBuilder sb = new StringBuilder();
      StackTraceElement[] stackTrace = ().getThread().getStackTrace();
      for (StackTraceElement s : stackTrace) {
        (() + "\n");
      }
      (TAG, ());
    }
  };

  public static LogMonitor getInstance() {
    return sInstance;
  }

  public boolean isMonitor() {
    return (mLogRunnable);
  }

  public void startMonitor() {
    (mLogRunnable, TIME_BLOCK);
  }

  public void removeMonitor() {
    (mLogRunnable);
  }
}

Here we use HandlerThread to construct a Handler. HandlerThread inherits from Thread, and is actually just a Thread, but there is one more Looper than ordinary Thread, providing the getLooper method of this Looper object, and then passing the looper object in HandlerThread when creating the Handler. In this way, our mIoHandler object is bound to the non-UI thread HandlerThread, which will not block the UI when handling time-consuming operations. If the UI thread blocks for more than 200ms, mLogRunnable will be executed in the child thread, printing out the current stack information of the UI thread. If the processing message does not exceed 1000ms, the mLogRunnable task will be removed in real time.

The general content of printing out the stack information when a stutter occurs is as follows: Development can locate time-consuming places through logs.

2020-10-30 14:26:13.823 30359-30415/ E/LogMonitor: (Native Method)
    (:443)
    (:359)
    $(:22)
    (:900)
    (:103)
    (:219)
    (:8347)
    (Native Method)
    $(:513)
    (:1055)

Advantages: Users can monitor the lag from the app level during the use of the app or testing process. Once the lag occurs, they can record the app status and information. As long as the dispatchMesaage is executed too much, they will be recorded. There will be no more problems and shortcomings faced by the previous two adb methods.

Disadvantages: Need to open another child thread to obtain stack information, which will consume a small amount of system resources.

In actual implementation, different mobile phones and Android systems and even different ROM versions, the Loop function may not be printed out ">>>>> Dispatching to " and "<<< Finished to " logs such as " are unable to perform this method.

Optimization strategy: We know that the start and end of the Loop function will be executed to print the log, so the optimization version will change the judgment of lag to, when Loop outputs the first log, it is regarded as a startMonitor when outputting the next log, and when outputting the next log, it is regarded as an end time to solve this problem.

In fact, there is an Observer interface in Looper that can complete this task well, but we cannot use it because it is marked hide, but we can know.

The Observer interface provides three methods, namely, listen to the task start, end, and error callback.

  /** {@hide} */
  public interface Observer {
    /**
     * Called right before a message is dispatched.
     *
     * <p> The token type is not specified to allow the implementation to specify its own type.
     *
     * @return a token used for collecting telemetry when dispatching a single message.
     *     The token token must be passed back exactly once to either
     *     {@link Observer#messageDispatched} or {@link Observer#dispatchingThrewException}
     *     and must not be reused again.
     *
     */
    Object messageDispatchStarting();

    /**
     * Called when a message was processed by a Handler.
     *
     * @param token Token obtained by previously calling
     *       {@link Observer#messageDispatchStarting} on the same Observer instance.
     * @param msg The message that was dispatched.
     */
    void messageDispatched(Object token, Message msg);

    /**
     * Called when an exception was thrown while processing a message.
     *
     * @param token Token obtained by previously calling
     *       {@link Observer#messageDispatchStarting} on the same Observer instance.
     * @param msg The message that was dispatched and caused an exception.
     * @param exception The exception that was thrown.
     */
    void dispatchingThrewException(Object token, Message msg, Exception exception);
  }

Use monitoring to stutter

Official documentation link (/reference/android/view/）

We know that the Android system sends a VSYNC signal every 16ms to notify the interface to redraw and render. The period of each synchronization is 16.6ms, representing the refresh frequency of one frame. The SDK contains a related class and related callbacks. Theoretically, the time period of two callbacks should be 16ms. If it exceeds 16ms, we think that a lag has occurred. Use the time period between the two callbacks to determine whether lag has occurred (this solution is supported only by Android 4.1 API 16 or above).

The principle of this solution is mainly to set its FrameCallback function through the Choreographer class. When each frame is rendered, the callback FrameCallback will be triggered, and the FrameCallback callback void doFrame (long frameTimeNanos) function will be triggered. The doFrame method will be called back once the interface rendering. If the interval between the two doFrames is greater than 16.6ms, it means that a lag has occurred.

public class FPSFrameCallback implements  {

  private static final String TAG = "FPS_TEST";
  private long mLastFrameTimeNanos = 0;
  private long mFrameIntervalNanos;

  public FPSFrameCallback(long lastFrameTimeNanos) {
    mLastFrameTimeNanos = lastFrameTimeNanos;
    // 1s 60 frames    mFrameIntervalNanos = (long) (1000000000 / 60.0);
  }

  @Override
  public void doFrame(long frameTimeNanos) {

    //Initialization time    if (mLastFrameTimeNanos == 0) {
      mLastFrameTimeNanos = frameTimeNanos;
    }
    final long jitterNanos = frameTimeNanos - mLastFrameTimeNanos;
    if (jitterNanos &gt;= mFrameIntervalNanos) {
      final long skippedFrames = jitterNanos / mFrameIntervalNanos;
      if (skippedFrames &gt; 30) {
        (TAG, "Skipped " + skippedFrames + " frames! "
            + "The application may be doing too much work on its main thread.");
      }
    }
    mLastFrameTimeNanos = frameTimeNanos;
    //Register the next frame callback    ().postFrameCallback(this);
  }
}

There is not much difference between the essence and log, but this one is more general and will not cause unavailability due to model system reasons.

Example

Let’s go to the actual combat and see how the code level is implemented.

The MainActivity code is as follows:

public class MainActivity extends AppCompatActivity {
  Handler handler = new Handler(());

  private final Runnable runnable = new Runnable() {
    @Override
    public void run() {
      try {
        (600);
        (runnable, 500);
      } catch (InterruptedException e) {
        ();
      }
    }
  };

  @Override
  protected void onCreate(Bundle savedInstanceState) {
    (savedInstanceState);
    setContentView(.activity_main);
    ().postFrameCallback(new FPSFrameCallback(()));
    ();
  }

  @Override
  protected void onResume() {
    ();
    (runnable, 500);
  }

}

The collected stack information is as follows:

2020-10-30 14:26:13.823 30359-30415/ E/LogMonitor: (Native Method)
    (:443)
    (:359)
    $(:22)
    (:900)
    (:103)
    (:219)
    (:8347)
    (Native Method)
    $(:513)
    (:1055)

For FPS log, you can see the following information:

I/Choreographer: Skipped 64 frames! The application may be doing too much work on its main thread.
I/FPS_TEST: Skipped 65 frames! The application may be doing too much work on its main thread.

If you want to use the above method into your APP, then there are still many operations to read the content of the reference.

Reference article

Guangyan Android lag monitoring system

This is the article about Android teaching you how to discover the implementation of APP lag. For more related content on Android APP lag, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!