O15: Metrics for AI Coding Agents
How to choose the North Star Metric for a coding agent
Coding agents produce a lot of measurable signals: lines written, tests passed, PRs opened, tokens consumed. Most teams reach for whichever one is easiest to instrument.
The harder question is whether the metric actually tracks the thing developers care about: a task handed off and finished well. A coding agent can score well on activity metrics while quietly failing on that dimension, shipping code that gets rewritten in review or completing tasks that create production problems downstream.
This post works through the full NSM selection process for a coding agent: stakeholders, core value, candidate metrics, and a VITAL framework evaluation, ending with a primary recommendation, the case against the runners-up, and the supporting and counter metrics that keep the primary honest.
Metrics for a Coding Agent
Identify Key Stakeholders
Primary: Individual developers using the agent to write, debug, refactor, and review code — seeking speed and output confidence
Secondary: Engineering managers and team leads absorbing downstream effects: code review load, production incident rates, knowledge distribution
Tertiary: The product organization — using the agent as a competitive differentiator, engineer retention tool, and shipping force multiplier
Articulate Core Value
Primary: Completing real software tasks autonomously — not suggesting lines, but closing tickets
Supporting: Developers delegate well-scoped work and redirect cognitive energy toward architecture, product judgment, and complex debugging
Business: Compressing time from intent to working, tested, merged code
Define Core Actions
The agent’s core loop:
Understand a task from natural language or a linked issue
Write or modify code across multiple files
Run tests and interpret results
Fix failures
Open a pull request ready for human review
Each step is measurable — and a potential failure point.
Brainstorm Potential NSMs
Task Completion Rate: % of agent-initiated tasks reaching a merged PR without the developer abandoning mid-task
Weekly Active Developers (WAD): Developers delegating at least one meaningful task per week — sustained adoption vs. novelty usage
Agent-to-Merge Time: Median time from task delegation to merged PR


