Building a Deterministic Perfetto Analyzer for Android

Syafix Said

January 11, 2026•13 minute read

Illustration generated by the author using an AI image generation tool.

A practical guide to programmatically understanding Android app performance

The Problem Every Android Performance Engineer Knows

You open a Perfetto trace.

Thousands of slices. Dozens of threads. Millions of data points. A timeline that looks like abstract art.

And somewhere in there — buried under layers of system noise, framework callbacks, and thread interleaving — is the answer to a simple question: Why is my app slow?

I’ve spent years debugging Android performance at scale. The pattern is always the same:

User complains about jank/slow startup
Engineer captures a trace
Engineer opens Perfetto UI
Engineer spends 2 hours scrolling, zooming, and squinting
Engineer maybe finds the root cause

Traces don’t tell stories. They dump facts — timestamped, nested, overwhelming. The gap between “having a trace” and “understanding performance” is enormous.

This article is about closing that gap programmatically.

Over the past few weeks, I’ve been building a Perfetto trace analyzer that transforms raw traces into structured, human-readable insights. No dashboards. Just deterministic analysis that mirrors how experienced engineers actually reason about performance.

By the end of this article, you’ll understand how to:

Make traces readable — startup time, long tasks, frame health
Make traces attributable — process, thread, component ownership
Separate responsibility — app vs framework vs system
Use time windows to find what mattered when
Generate deterministic suspects — prioritized signals for investigation

Why Perfetto Traces Are Hard to Read

Perfetto is incredibly powerful. It captures:

CPU scheduling
Frame rendering
Binder transactions
App-defined trace sections
System services activity

But that power comes with complexity. A typical trace might contain:

50,000+ slices
100+ unique thread names
Nested hierarchies 10+ levels deep

When you’re debugging a janky scroll or slow startup, you don’t need all of that. You need:

What happened — startup time, long tasks, frame health
Who did it — which process, which thread
What kind of work — app code vs framework vs system

Let’s tackle these one by one.

Step 1: Making Traces Readable

The first goal is extracting high-level metrics that humans actually care about.

Startup Time

The analyzer estimates startup using a best-effort heuristic: the time from the earliest slice to the first Choreographer#doFrame.

This isn’t perfect — it doesn’t account for all startup scenarios — but it provides a useful baseline for comparison.

From a TraceToy test trace (TraceToy is a sample app I built specifically to generate interesting traces for testing — more on this later):


{
  "startup_ms": 2910.15
}

In production apps, you’d typically see 500–1500ms for well-optimized cold starts, or 2000–5000ms+ for apps with heavy initialization. The exact number matters less than tracking it over time — a 20% regression is a 20% regression whether your baseline is 800ms or 3000ms.

Production apps often instrument startup more precisely — using reportFullyDrawn(), custom trace markers like Startup#complete, or libraries like Jetpack App Startup. The analyzer’s heuristic is a reasonable fallback when those aren’t available, but you can (and should) adapt it to match your app’s actual definition of “startup complete.”

One number. Immediately useful for regression detection.

Long Tasks on the UI Thread

Long tasks are slices that exceed a threshold and block the main thread. These are the primary suspects for jank and unresponsiveness.

The analyzer uses a 50ms threshold by default (configurable via –long-task-ms):

{
  "ui_thread_long_tasks": {
    "threshold_ms": 50,
    "count": 12,
    "top": [
      {"name": "<internal slice>", "dur_ms": 16572.1},
      {"name": "216", "dur_ms": 4103.2},
      {"name": "Handler: android.view.View", "dur_ms": 312.5}
    ]  }
}

See the problem? Numeric IDs. Internal slices. No context. No attribution.

Technically correct. Practically useless.

We know something blocked the main thread for 16 seconds. But we don’t know what or who owns it. This is where raw analysis breaks down — and why Step 2 (attribution) matters.

Frame Health

Dropped frames cause visible jank. The analyzer extracts frame timing from Choreographer#doFrame slices:

{
  "frame_features": {
    "total_frames": 888,
    "janky_frames": 37,
    "p95_frame_ms": 14.35
  }
}

Percentiles matter more than averages here — a few bad frames ruin perceived smoothness even if most frames are fast.

The Limitation

At this point, we can answer: What happened?

But we can’t answer: Who’s responsible?

Step 2: Making Traces Attributable

Attribution means connecting performance data to ownership. When debugging real issues, engineers ask:

Which process did this work?
Which thread?
Was it my app or something else?

Focus Process Filtering

First, we need to cut through the noise. Traces capture everything on the device — system services, other apps, kernel activity. Most of that is irrelevant.

The analyzer supports a focus mode:

python3 -m perfetto_agent.cli analyze 
  --trace tracetoy_trace.perfetto-trace 
  --focus-process com.example.tracetoy 
  --out analysis.json

This shifts the output from “everything happening on the device” to “what matters for this app.” A small flag with a huge impact on signal-to-noise ratio.

Process and Thread Identification

With a focus process set, every slice now gets enriched with ownership data. The analyzer joins pid and tid to process and thread names:

{
  "name": "BG#churn",
  "dur_ms": 267.29,
  "pid": 5664,
  "tid": 5664,
  "thread_name": "xample.tracetoy",
  "process_name": "com.example.tracetoy"
}

Compare this to the useless “name”: “216” from Step 1. Now we know:

✅ It’s from my app (com.example.tracetoy)
✅ It ran on the main thread (tid == pid)
✅ It’s an app-defined section (the BG# prefix)

Main Thread Detection

Here’s a subtle but important detail: the main thread isn’t always labeled “main” in traces.

The analyzer uses a best-effort heuristic:

If a thread named “main” exists for the app process → use it
Otherwise, fall back to tid == pid (commonly true for the main thread in Android)

This mirrors how experienced engineers reason when metadata is incomplete — and the analyzer documents which heuristic it used for transparency.

App-Defined Sections Become First-Class

Android’s Trace.beginSection() API lets developers mark custom spans:

Trace.beginSection("UI#loadUserProfile")
// ... work ...
Trace.endSection()

The analyzer extracts and aggregates these explicitly:

{
  "app_sections": {
    "top_by_total_ms": [
      {"name": "BG#churn", "total_ms": 11811.05, "count": 194},
      {"name": "UI#stall_button_click", "total_ms": 818.05, "count": 4}
    ]  }
}

This answers: Which parts of my code dominated execution?

At this point, we’ve turned anonymous slices into attributed, actionable data. But we still haven’t answered the most important question: whose fault is it — my app, the framework, or the system?

Step 3: Separating Responsibility

Here’s the uncomfortable truth about mobile performance:

When your app is slow, not all of that work is actually yours.

Traces happily mix together:

Your app code
Android framework internals
System and kernel activity

Without separating those, it’s easy to draw the wrong conclusions — and optimize the wrong thing.

Deterministic Classification

Each slice is now labeled as one of:

🟢 app — App markers on the focused process (e.g., UI#stall_button_click)
🔵 framework — Framework tokens on the focused process (e.g., Choreographer#doFrame)
🔴 system — Non-focused PIDs or system tokens (e.g., binder transaction)
⚪ unknown — Can’t confidently classify (e.g., numeric IDs)

The classifier is intentionally conservative:

Based on pid ownership
Based on thread ownership
Based on explicit name-token rules

If a slice doesn’t clearly belong to a category, it becomes “unknown”. That’s not a failure — it’s honesty.

Long Tasks with Responsibility Labels

Here’s what a classified long slice looks like:

{
  "name": "BG#churn",
  "dur_ms": 267.29,
  "pid": 5664,
  "tid": 5664,
  "thread_name": "xample.tracetoy",
  "process_name": "com.example.tracetoy",
  "category": "app"
}

And a framework-heavy one:

{
  "name": "dequeueBuffer - VRI[MainActivity]#0(BLAST Consumer)0",
  "dur_ms": 180.4,
  "pid": 5664,
  "tid": 5688,
  "thread_name": "RenderThread",
  "category": "framework"
}

Same trace. Very different implications for what to optimize.

Aggregate Breakdowns

Instead of manually eyeballing hundreds of slices, the analyzer computes:

{
  "work_breakdown": {
    "by_category_ms": {
      "app": 12603.72,
      "framework": 305.77,
      "system": 0.0,
      "unknown": 5167.50
    }
  }
}

This single block answers a powerful question:

Is this performance problem primarily app-owned?

Sometimes the answer is uncomfortable. Sometimes it’s relieving. Either way, it’s grounded in data.

Main Thread Blocking: Who’s Really at Fault?

The analyzer breaks down main-thread blocking time by category:

{
  "main_thread_blocking": {
    "app_ms": 12603.72,
    "framework_ms": 305.77,
    "system_ms": 0.0,
    "unknown_ms": 2840.88
  }
}

This is where responsibility becomes unavoidable.

If the main thread is blocked mostly by app work (like in this trace), you know exactly where to focus optimization efforts.

The Summary Block

Everything rolls up into a minimal summary:

{
  "summary": {
    "main_thread_found": true,
    "top_app_sections": ["BG#churn", "dequeueBuffer - VRI[MainActivity]#0...", "UI#stall_button_click"],
    "top_long_slice_name": "BG#churn",
    "dominant_work_category": "app",
    "main_thread_blocked_by": "app"
  }
}

This is not an explanation. It’s a signal — telling you where to look, and where not to.

Step 4: Time Windows and Suspects

Knowing who did the work is helpful. But performance regressions are rarely global — they’re temporal.

Typical questions sound like:

Why was startup slow?
Why did jank spike only after the first screen?
Why is the UI thread blocked during interactions but not at launch?

Attribution without time context still forces you to scrub timelines manually. Step 4 reduces that cognitive load.

Time Windows: Making “When” Explicit ⏱️

Instead of treating a trace as one long timeline, the analyzer divides it into coarse, deterministic windows:

Startup window — From process start → end of startup
Steady-state window — The period after startup (default: 5 seconds)

{
  "time_windows": {
    "startup": {
      "start_ms": 0.0,
      "end_ms": 2910.15,
      "method": "startup_ms"
    },
    "steady_state": {
      "start_ms": 2910.15,
      "end_ms": 7910.15,
      "method": "startup_end + 5000ms"
    }
  }
}

This alone changes how you reason about a trace.

Windowed Breakdowns: Responsibility In Context 📊

For each window, the analyzer computes:

Total work by category (app, framework, system, unknown)
Main-thread blocking by category

From the TraceToy trace:

{
  "window_breakdown": {
    "startup": {
      "by_category_ms": {"app": 0.0, "framework": 0.0, "system": 378.43, "unknown": 101.34},
      "main_thread_blocking_ms": {"app_ms": 0.0, "framework_ms": 0.0, "system_ms": 0.0, "unknown_ms": 101.34}
    },
    "steady_state": { /* ... */ }
  }
}

This answers, deterministically:

Startup was dominated by system work. Steady-state had significant unknown work blocking the main thread.

No guesswork required.

From Data to Suspects 🕵️

Once you have attribution, responsibility, and time windows, you can surface suspects.

A suspect is not an explanation. It’s a prioritized signal.

The analyzer generates a deterministic list:

{
  "suspects": [
    {"label": "Startup main thread dominated by unknown work", "window": "startup", "evidence": {"unknown_ms": 101.34}},
    {"label": "Startup dominated by system work", "window": "startup", "evidence": {"system_ms": 378.43}}
    // ... more suspects for steady_state
  ]}

These labels are templated, reproducible, and evidence-backed.

They don’t explain — they point.

The Summary Now Means Something 🧭

The summary block becomes actionable:

{
  "summary": {
    "startup_dominant_category": "system",
    "steady_state_dominant_category": "unknown",
    "top_suspect": "Startup main thread dominated by unknown work"
  }
}

If you only read this, you already know:

Where to start investigating
What kind of work to look at
Which phase matters most

That’s a huge shift from raw traces.

Putting It All Together

Here’s the complete workflow:

1. Capture a Trace

Use Android Studio’s profiler or the command line:

adb shell perfetto -o /data/misc/perfetto-traces/trace.perfetto-trace -t 10s 
  sched freq idle am wm gfx view binder_driver hal dalvik camera input res memory

2. Run the Analyzer

python3 -m perfetto_agent.cli analyze 
  --trace trace.perfetto-trace 
  --focus-process com.example.myapp 
  --out analysis.json 
  --long-task-ms 50 
  --top-n 10

3. Read the Output

Start with the summary block to see dominant categories and top suspects, then drill into specifics:

work_breakdown for category distribution
features.long_slices_attributed for individual suspects
features.app_sections for your instrumented code

What This Is Not Doing

Let’s be clear about limitations:

❌ Does not assign blame — only surfaces evidence
❌ Does not generate advice — that requires understanding intent
❌ Does not claim causality — correlation isn’t causation

This system makes responsibility visible. That’s enough.

Testing with TraceToy

I built a sample app called TraceToy to generate interesting traces for testing the analyzer.

It includes:

Startup initialization with StartupInit marker
UI Stall button — intentionally blocks UI thread for 200ms with UI#stall_button_click
Background load toggle — generates BG#churn markers
Scrollable list — generates frame rendering events

Screenshot captured by the author from the Android TraceToys sample app.

Quick Start

# Record a 20-second trace
adb shell perfetto -o /data/misc/perfetto-traces/trace -t 20s sched gfx view dalvik

# Interact with the app (tap UI Stall, scroll, toggle background load)

# Pull and analyze
adb pull /data/misc/perfetto-traces/trace tracetoy_trace.perfetto-trace
python3 -m perfetto_agent.cli analyze 
  --trace tracetoy_trace.perfetto-trace 
  --focus-process com.example.tracetoy 
  --out analysis.json

The output should include:

TraceToy process identified (com.example.tracetoy, pid 5664)
BG#churn (194 occurrences, 11.8s total) and UI#stall_button_click (4 occurrences, 818ms total) in app sections
199 long tasks exceeding the 50ms threshold
888 frames with 37 janky (p95: 14.35ms)

Implementation Highlight: Slice Classification

The most interesting part of the analyzer is the classifier. It uses explicit token lists and is intentionally conservative:

def classify_slice_name(name: str | None, pid: int | None, focus_pid: int | None) -> str:
    """Classify a slice into app/framework/system/unknown using pid + name tokens."""
    
    # If not from focus process, it's system
    if focus_pid is not None and pid is not None and pid != focus_pid:
        return "system"

    if not name:
        return "unknown"

    lower_name = name.lower()
    
    app_tokens = ["ui#", "bg#", "startupinit"]    framework_tokens = [
        "choreographer", "doframe", "renderthread", 
        "viewrootimpl", "dequeuebuffer", "blast", "hwui"
    ]    system_tokens = ["binder", "surfaceflinger", "sched", "kworker", "irq", "futex"]
    # Check app tokens first (highest confidence)
    if focus_pid is not None and pid == focus_pid:
        if any(token in lower_name for token in app_tokens):
            return "app"
        if any(token in lower_name for token in framework_tokens):
            return "framework"

    if any(token in lower_name for token in system_tokens):
        return "system"

    return "unknown"

The key insight: conservative classification beats false confidence. If we can’t confidently classify something, “unknown” is the honest answer.

See the full implementation for startup detection, main thread resolution, and frame health computation.

Instrumenting Your App

To get the most value from this analysis, instrument your code:

import android.os.Trace
class UserProfileLoader {
    fun loadProfile(userId: String): UserProfile {
        Trace.beginSection("UI#loadUserProfile")
        try {
            Trace.beginSection("UI#fetchFromCache")
            val cached = cache.get(userId)
            Trace.endSection()
            
            if (cached != null) return cached
            
            Trace.beginSection("IO#fetchFromNetwork")
            val profile = api.fetchProfile(userId)
            Trace.endSection()
            
            return profile
        } finally {
            Trace.endSection()
        }
    }
}

Use a consistent naming convention:

UI# — Main thread / UI work
BG# — Background thread work
IO# — I/O operations
DB# — Database operations

This makes the analyzer’s output immediately actionable.

Try It Yourself

The analyzer is open source:

🔗 Perfetto Analyzer: github.com/singhsume123/perfetto-agent

🔗 TraceToy Test App: github.com/singhsume123/TraceToy

The repo includes:

The Python CLI analyzer
Example traces and outputs (analysis.json)
Setup instructions

Key Takeaways

Perfetto traces are powerful but overwhelming — programmatic analysis extracts signal from noise
Attribution matters — knowing who did the work (process, thread, component) changes everything
Separate responsibility explicitly — app vs framework vs system is the most important distinction
Time windows matter — startup vs steady-state often have completely different characteristics
Suspects point, not explain — deterministic signals tell you where to look first
Conservative classification beats false confidence — “unknown” is better than wrong
Instrument your code — Trace.beginSection() is your friend

Performance debugging is not about finding everything that happened. It’s about finding:

What mattered
When it mattered
Who owned it

This analyzer makes that model explicit — and machine-readable.

Have questions or feedback? Find me on LinkedIn

Building a Deterministic Perfetto Analyzer for Android was originally published in ProAndroidDev on Medium, where people are continuing the conversation by highlighting and responding to this story.

Syafix Said

Web Developer, Web Design, Web Builder, Project Manager, Business Analyst, .Net Developer

Building a Deterministic Perfetto Analyzer for Android

The Problem Every Android Performance Engineer Knows

Why Perfetto Traces Are Hard to Read

Step 1: Making Traces Readable

Startup Time

Long Tasks on the UI Thread

Frame Health

The Limitation

Step 2: Making Traces Attributable

Focus Process Filtering

Process and Thread Identification

Main Thread Detection

App-Defined Sections Become First-Class

Step 3: Separating Responsibility

Deterministic Classification

Long Tasks with Responsibility Labels

Aggregate Breakdowns

Main Thread Blocking: Who’s Really at Fault?

The Summary Block

Step 4: Time Windows and Suspects

Time Windows: Making “When” Explicit ⏱️

Windowed Breakdowns: Responsibility In Context 📊

From Data to Suspects 🕵️

The Summary Now Means Something 🧭

Putting It All Together

1. Capture a Trace

2. Run the Analyzer

3. Read the Output

What This Is Not Doing

Testing with TraceToy

Quick Start

Implementation Highlight: Slice Classification

Instrumenting Your App

Try It Yourself

Key Takeaways

Syafix Said

No Comments

This Post Has 0 Comments

Leave a Reply Cancel reply

Test Multi-Device Interactions with the Android Emulator

How to Implement Shaders in Compose Multiplatform (Android, iOS, Desktop & Web)

Biometric Auth in Compose Made Easy: The New Library You Need