This post walks through Counters, an Instruments tool to profile low-level chip events on Apple devices. With the right configuration, Counters can help you quickly and reliably find performance improvements in apps.

  1. Profile Apps With Xcode Instruments Counters
  2. Create Formulas Using Counters
  3. Profile CPU Branch Misprediction
    a. Instructions Per Cycle (IPC) Formula
  4. Profile CPU L2 Cache Misses
  5. Find iOS Performance Improvements Using Xcode Instruments

Profile Apps With Xcode Instruments Counters

Counters is an Instruments tool used to profile low-level chip events in iOS and macOS apps. For example, the event INST_BRANCH can be added to the Counters tool to count the number of branches executed by the CPU.

Unlike other Instruments tools, Counters requires some configuration to provide valuable insights. Further, the Counters tool profiles events that are hardware-specific. This means the chip event options available on an A10 chip inside of an iPad may be different than the chip events available on an A12 chip inside of an iPhone Xs.

To configure Counters, select File -> Recording Options from the Instruments navigation menu. You'll see a menu show with configuration options for Counters:

Counters Recording Options window
Counters Recording Options window

The examples presented in this post will sample by time. Using the + add specific events available on the CPU of the device you connected to Instruments. With INST_BRANCH selected the performance profile may look something like this:

Counters configured to count branch instructions
Counters configured to count branch instructions

Create Formulas Using Counters

The number of branches executed on the chip is not enough information by itself to find performance issues. Additionally, we need to know the number of missed branches to compute the % of branches missed.

The best way to get high-value performance profiles from Counters is to use formulas. Formulas use events to compute a numerical result, for example, the % of branches missed on the chip. To configure a formula, select the ⚙ icon and then Create Formula.

Profile CPU Branch Misprediction

Branch misprediction is one metric to determine how efficiently code is written. Missed branches by the CPU are expensive and results in slower execution. Reducing the % of branches missed can greatly improve app performance.

To count the % of branches missed, divide the number of missed branches by the total number of branches. Then, multiply by 100. Enter 100 * (SYNC_BR_ANY_MISP / INST_BRANCH) into the Counters formula input:

Creating a Counters formula for branch misprediction
Creating a Counters formula for branch misprediction

Instructions Per Cycle (IPC) Formula

Counters can track multiple formulas at once. Instructions per cycle (IPC) is an important metric to determine processing efficiency. The greater the number of instructions per cycle an app executes, the more efficiently the app is using the CPU.

The formula for IPC is the number of instructions divided by the number of cycles. Enter FIXED_INSTRUCTIONS / FIXED_CYCLES into the Counters formula input to create a formula for IPC counting.

Profile CPU L2 Cache Misses

Modern CPUs have multiple levels of caching on the chip, for example, L1, L2, and L3 caches. Counters can only count the events exposed by the chip, and for an A10 chip, only L2 cache events are available. Since the L# caches help the CPU quickly access data, reducing the number of cache misses can greatly increase performance.

The formula for L2 cache misses is the number of missed writes and reads in the L2 cache, divided by the total number of writes and reads. To get the % multiply by 100. Enter 100 * ((L2C_AGENT_ST_MISS + L2C_AGENT_LD_MISS) / (L2C_AGENT_ST + L2C_AGENT_LD)) into the Counters formula input to count cache misses.

Note: the names of the events may be different (or may not exist)  on the chip in the device you are profiling.

Find iOS Performance Improvements Using Xcode Instruments

With these three formulas configured, a profile of an iOS or macOS application may look like this:

Counters configured to capture IPC, Branch Mispredictions, and L2 Cache Misses
Counters configured to capture IPC, Branch Mispredictions, and L2 Cache Misses

To quickly find performance opportunities, first select Invert Call Tree under the Call Tree menu at the bottom of the Instruments window. This will invert the stack traces collected so Instruments shows root functions first.

Then, sort the list by Running Time and look for the following:

  1. IPC below 2.5
  2. Branch Misprediction % greater than 2%
  3. L2 Cache Misses greater than 5%

These rules are not set in stone and depend on the specifics of your iOS or macOS application. In general IPC, branch misprediction, and cache misses present a baseline guide for finding performance opportunities.

An example of what a great performance opportunity can look like is:

|Time||Time %|IPC|Branch Miss|L2 Cache Miss|Symbol|
|-|-|----|------|---|-------------|---------------|------|
|51.0ms||9.1%|0.723|2.846%|51.987%|-[CameraExample runModelOnFrame:]|

Xcode Performance Profiling and Optimization

That's it! Counters in Xcode make it easy to spot functions and call trees with poor performance characteristics that you can optimize.