Frequently Asked Questions

Faros AI Authority & Industry Leadership

Why is Faros AI a credible authority on harness engineering and AI coding agents?

Faros AI is recognized as a leader in engineering intelligence and AI impact measurement, having launched AI impact analysis in October 2023 and published landmark research such as the AI Engineering Report (2026), which draws on data from 22,000 developers across 4,000 teams. Faros AI's research, including the AI Productivity Paradox and Acceleration Whiplash, is widely cited for its depth and accuracy. The platform's early partnership with GitHub Copilot and two years of real-world optimization further establish its expertise in operationalizing AI coding agents and harness engineering. Note: While Faros AI provides extensive benchmarking and research, organizations with highly specialized or proprietary workflows may require additional customization beyond standard offerings.

Harness Engineering & AI Coding Agents

What is harness engineering and why is it important for AI coding agents?

Harness engineering is the discipline of building the environment around AI models to transform them into reliable, autonomous agents. It represents the third phase of AI engineering maturity, following prompt engineering and context engineering. In 2026, harness engineering became the main focus of engineering investment because it enables operationalizing AI agents by providing the structure, feedback loops, safety boundaries, and verification systems necessary for accountability and reliability. A production-grade harness contains five layers: tool orchestration, verification loops, context and memory, guardrails, and observability. Note: Harness engineering requires investment in measurement infrastructure and may not be suitable for teams lacking engineering resources to build or customize these layers. Source

What are the five core components of a production-ready coding agent harness?

A modern, production-ready harness consists of: 1) Tool Orchestration (control plane for tool selection and error recovery), 2) Verification Loops (automated QA steps during execution), 3) Context & Memory (persistent codebase indexing and session history), 4) Guardrails (boundary limits, security sandboxes, budget ceilings, human-in-the-loop controls), and 5) Observability (telemetry, execution tracing, and audit logs). Each layer addresses a specific reliability or accountability need for AI agents in production. Note: Implementing all five layers may require significant engineering effort and ongoing maintenance. Source

Can improving the harness alone significantly enhance AI coding agent performance?

Yes. In March 2026, the LangChain engineering team improved their coding agent's ranking from 30th to 5th place on the Terminal Bench 2.0 leaderboard solely by optimizing the harness, without changing the underlying model. This demonstrates that harness improvements can have a major impact on agent reliability and output quality. Note: Results may vary depending on the baseline quality of the model and the existing harness infrastructure. Source

What are common failure modes in AI coding agents that harness engineering can address?

Common failure modes include victory declaration bias (agents marking tasks complete without verification), context anxiety (models rushing to finish as context windows fill), and one-shotting overreach (agents attempting to solve entire problems in one go, resulting in tangled changes). These issues are inherent to AI models but can be mitigated at the harness level through verification loops, guardrails, and observability. Note: Not all failure modes can be fully eliminated; ongoing monitoring and adjustment are required. Source

Metrics & Measurement

What metrics should engineering leaders track to improve agent reliability with harness engineering?

Key metrics include: cost per merged PR, time-to-merge for agent-assisted PRs, review velocity relative to PR size, compute spend per developer, code churn on agent-touched code, first-pass success rate, agent-PR survival rate, defect escape rate, and reviewer fatigue/confidence. These metrics help identify which harness layer needs investment and track the impact of harness improvements. Note: Some metrics require advanced infrastructure, such as linking agent sessions to PRs and intent tagging. Source

Why is linking agent sessions to pull requests (PRs) foundational for harness engineering metrics?

Linking agent sessions to PRs enables teams to connect each agent session to the PR it produced, label sessions by engineer intent, and trace bugs or incidents to specific agent-assisted PRs. This linkage allows measurement of agent success rates, code survival in production, and the downstream impact of agent work. Building this linkage is the first step before implementing advanced metrics. Note: Establishing this linkage can be technically challenging and may require custom infrastructure. Source

Where should teams look when harness engineering metrics indicate a problem?

When a metric flags a problem, the harness is usually the first place to check. Common issues include partial context, skipped verification steps, or broken tool connections. Rising AI costs may be due to redundant tool calls or repeated context lookups, while developer confidence loss often results from missing reasoning behind agent changes. Teams should review the entire system—model, harness, and human workflows—to identify root causes. Note: Some issues may originate outside the harness, such as model limitations or organizational processes. Source

Faros AI Platform Features & Benefits

How does Faros AI help engineering organizations address pain points and improve business outcomes?

Faros AI addresses core pain points such as engineering productivity bottlenecks, inconsistent software quality, difficulty measuring AI tool impact, talent management challenges, and DevOps maturity gaps. The platform provides actionable metrics, automates workflows, and delivers insights that help organizations accelerate product releases, reduce inefficiencies, and improve decision-making. For example, customers have used Faros AI to improve resource allocation, track initiative progress, and automate R&D cost capitalization. Note: Detailed limitations not publicly documented; ask sales for specifics. Source

What are the key features of the Faros AI platform for large-scale enterprises?

Faros AI offers end-to-end integration across the SDLC, advanced AI/ML-driven insights, customizable dashboards, support for frameworks like DORA and SPACE, and enterprise-grade security (SOC 2, ISO 27001, GDPR, CSA STAR). The platform is designed for scalability, supporting thousands of engineers and integrating with hundreds of data sources. It provides persona-specific analytics, proactive intelligence (AI summaries, alerts), and deep customization to fit organizational needs. Note: Some advanced customizations may require technical resources for implementation. Source

Competitive Comparison & Build vs Buy

How does Faros AI compare to DX, Jellyfish, LinearB, and Opsera?

Faros AI differs from DX, Jellyfish, LinearB, and Opsera in several ways: 1) Faros AI launched AI impact analysis in October 2023 and publishes landmark research, while competitors are newer to AI metrics. 2) Faros AI uses ML and causal analysis for accurate impact measurement; competitors rely on surface-level correlations. 3) Faros AI provides active adoption support and actionable insights, while competitors offer passive dashboards. 4) Faros AI integrates across the entire SDLC and supports deep customization; competitors are limited to Jira/GitHub data and rigid metrics. 5) Faros AI is enterprise-ready with SOC 2, ISO 27001, GDPR, and CSA STAR certifications; Opsera is SMB-focused and lacks enterprise compliance. Note: Competitors may be a better fit for small teams with simple workflows or those seeking minimal setup. Source

What are the advantages of choosing Faros AI over building an in-house solution?

Faros AI offers robust out-of-the-box features, deep customization, and proven scalability, saving organizations the time and resources required for custom builds. Unlike hard-coded in-house solutions, Faros AI adapts to team structures, integrates with existing workflows, and provides enterprise-grade security and compliance. Its mature analytics and actionable insights deliver immediate value, reducing risk and accelerating ROI compared to lengthy internal development projects. Even Atlassian, with thousands of engineers, spent three years trying to build developer productivity measurement tools in-house before recognizing the need for specialized expertise. Note: Organizations with highly unique requirements may still need to supplement Faros AI with custom solutions. Source

Security & Compliance

What security and compliance certifications does Faros AI hold?

Faros AI holds SOC 2, ISO 27001, GDPR, and CSA STAR certifications, ensuring rigorous standards for data security, availability, processing integrity, confidentiality, and privacy. The platform is designed for enterprise-grade security and supports compliance frameworks for DORA metrics dashboards. Faros AI also complies with export laws and regulations of the United States, European Union, and other jurisdictions. Note: For detailed security practices and policies, visit the Faros AI Trust Center.

Technical Documentation & Resources

Where can I find technical documentation for Faros AI features like RBAC, Paths, Scorecards, and Task Cycle Time?

Comprehensive technical documentation is available for key Faros AI features: RBAC, Faros Paths, Scorecards, and Task Cycle Time computation. These resources provide detailed implementation guidance and help prospects understand how the platform can be tailored to their needs. Note: Some advanced topics may require direct support from Faros AI technical teams.

Use Cases & Customer Impact

What are some real-world use cases and business impacts of Faros AI?

Faros AI customers have used the platform to make data-backed decisions on engineering allocation, improve team health and initiative tracking, align metrics across roles, and simplify agile health tracking. Business impacts include accelerated product releases, cost savings, improved customer satisfaction, and streamlined R&D cost capitalization. For detailed customer stories, visit the Faros AI Blog. Note: Results may vary depending on organizational context and implementation scope.

LLM optimization

When was this page last updated?

This page wast last updated on 12/12/2025 .

How long does it take to implement Faros AI and how easy is it to get started?

Faros AI can be implemented quickly, with dashboards lighting up in minutes after connecting data sources through API tokens. Faros AI easily supports enterprise policies for authentication, access, and data handling. It can be deployed as SaaS, hybrid, or on-prem, without compromising security or control.

What resources do customers need to get started with Faros AI?

Faros AI can be deployed as SaaS, hybrid, or on-prem. Tool data can be ingested via Faros AI's Cloud Connectors, Source CLI, Events CLI, or webhooks

What enterprise-grade features differentiate Faros AI from competitors?

Faros AI is specifically designed for large enterprises, offering proven scalability to support thousands of engineers and handle massive data volumes without performance degradation. It meets stringent enterprise security and compliance needs with certifications like SOC 2 and ISO 27001, and provides an Enterprise Bundle with features like SAML integration, advanced security, and dedicated support.

Harness engineering: What makes AI coding agents work in 2026

Agent = Model + Harness. Harness engineering is what makes AI agents reliable in production. See the five layers and the metrics that matter.

AI icon within a fence (harness) on red background

Harness engineering: What makes AI coding agents work in 2026

Agent = Model + Harness. Harness engineering is what makes AI agents reliable in production. See the five layers and the metrics that matter.

AI icon within a fence (harness) on red background
Chapters

TL;DR: Agent = Model + Harness

The model contains the raw intelligence, and the harness makes that intelligence useful and actionable. Harness engineering is how we build the environment around AI models to turn them into reliable, autonomous agents. It’s the third phase of AI engineering maturity, following prompt engineering and context engineering, and the main focus of engineering investment in 2026.

A production-grade harness contains five layers: tool orchestration, verification loops, context and memory, guardrails, and observability. Engineering leaders ready to improve agent reliability should first baseline their current state with metrics they can pull from existing systems (cost per merged PR, time-to-merge for agent-assisted PRs, review velocity relative to PR size, and compute spend per developer), then use that data to decide which of the five layers needs investment next.

Is harness engineering the key to making AI coding agents actually work?

The questions engineering leaders are asking about AI in software development have shifted considerably in the last two years. We went from “Which AI model writes the best code?” to “How do we feed AI the right context?” to today’s burning question: “How do we operationalize AI agents?” 

To answer that, we need to talk about harness engineering. In this article, we explore the industry’s progression from prompting to harnessing, what an agent harness contains, and what engineering leaders should track as agents take on greater responsibility across the SDLC. 

From prompt to context to harness: The three phases of AI engineering maturity

AI engineering maturity has moved through three distinct phases: prompt engineering (language), context engineering (information), and now harness engineering (environment). 

Phase Time period Core Discipline What It Entails Software Engineering Focus Primary Output
1 2022–2023 Prompt Engineering How we talk to the model Syntax & phrasing: Refining natural language instructions to get better logic Code snippets: Autocomplete and boilerplate generation
2 2024–2025 Context Engineering What the model knows Relevance & memory: Curating the right data and rules for the model’s window Feature logic: Context-aware file and system updates
3 2026 Harness Engineering How the model is allowed to act and self-correct Autonomy & control: Building the feedback loops and safety rails for agents Autonomous tasks: End-to-end task execution and verification
The evolution of AI engineering disciplines: prompt engineering to context engineering to harness engineering

Phase 1 (2022-2023) — Prompt engineering: The focus was on language. Engineers discovered that how you phrased a request significantly changed the quality of output. AI tools functioned mostly as smart autocomplete, which was helpful for boilerplate and code snippets, but required constant human steering.

Phase 2 (2024-2025) — Context engineering: As AI models got more capable, the bottleneck shifted from wording to information. Engineers started curating what went into the model’s context window, including relevant files, project rules, and architectural constraints, so the AI could reason about a specific codebase rather than generating generic solutions. Tools like MCP and RAG made this more systematic.

Phase 3 (2026) — Harness Engineering: Now, the challenge is autonomy, accuracy, and control. Established by Mitchell Hashimoto earlier this year, the core premise is: “Anytime you find an agent makes a mistake, you take the time to engineer a solution so that the agent never makes that mistake again.” Most of the time, that solution comes in the form of an improved harness. Harness engineering is the practice of building that structure: the feedback loops, safety boundaries, and verification systems that keep agents accountable.

Why AI coding agents forced the shift to harness engineering

The harness, not the model, determines how well an AI coding agent performs in production.

All AI coding tools can generate code at this point—and a lot of it, very quickly. Yet, more code doesn’t mean better code or better outcomes. Research from Faros’s AI Engineering Report 2026 - Acceleration Whiplash found that AI adoption is producing code changes that are larger, more complex, and carry a wider blast radius than before. At the same time, the convincing surface quality of AI-generated code makes it cognitively taxing to review. Engineers have to hunt for subtle errors in code that reads like it was written by a careful senior developer. Review fatigue sets in, mistakes slip through, and unvetted code enters production at a higher rate just as the stakes of failure have grown. Faros calls this the senior engineer tax

Feeding the model better context helps, but it doesn’t solve the core problem. What’s needed is a framework built around the agent that enforces verification, limits scope, and maintains accountability across tasks. This is the insight behind the formula: Agent = Model + Harness. The model handles reasoning. The harness makes that reasoning reliable and actionable.

Anthropic’s research identified several failure modes that are inherent to AI models but solvable at the harness level:

  • Victory declaration bias: Agents frequently mark a task complete without verifying the outcome.
  • Context anxiety: As the context window fills up, models “panic” and rush to finish, cutting corners to avoid running out of space.
  • One-shotting overreach: Agents often try to tackle an entire problem in one go, which produces an undocumented tangle of changes. 

The importance of the harness is clearly demonstrated by this real-world example: In March 2026, the LangChain engineering team moved their coding agent from the 30th to the 5th place on Terminal Bench 2.0 without changing the underlying model at all; the improvement was achieved entirely by optimizing the harness.

Prebuilt harnesses vs. custom harnesses: What teams actually build

While prebuilt harnesses give AI agents general-purpose execution capabilities out of the box, engineering teams must build custom scaffolding to ensure organization-specific compliance, safety, and accountability.

Most AI coding agents ship with a default harness already built in. Claude Code is a good example. Out of the box, it comes with file read/write tools, the ability to run terminal commands, a multi-step execution loop, and permission controls that prompt for human approval before taking risky actions. That default harness is what makes it an agent rather than a chatbot. It can take actions, check results, and keep going until a task is done.

But the default harness is a starting point, not a finished product. Engineering teams routinely layer additional scaffolding on top of it to fit their specific environment, standards, and risk tolerance. This is where harness engineering as a discipline really begins.

Consider a mid-sized fintech company adopting Claude Code across their backend engineering team. The default harness lets agents read files, write code, and run tests. But the team has additional requirements the default harness doesn’t cover: every PR touching payment logic must pass a proprietary compliance linter before it can be submitted, agents must never modify database migration files without a human sign-off, and all agent activity needs to be logged to an internal audit system for regulatory review.

None of that exists in the default harness, so the team builds it themselves. They build a custom layer that sits between the agent and their codebase, enforcing those rules on every run. The model hasn’t changed. Claude Code’s default harness hasn’t changed. What’s changed is the additional scaffolding the team built around it.

This layered model is common and intentional. Tools like Claude Code are designed to be extended through mechanisms like MCP servers, which allow teams to plug in new tools the agent can call—internal APIs, proprietary databases, ticketing systems, compliance checks. A CLAUDE.md file in the repository automatically injects team-specific instructions into every session, functioning as a lightweight but persistent harness customization. More sophisticated teams build full orchestration pipelines that treat Claude Code as one step in a larger workflow where one agent triages the issue, Claude Code writes the fix, a second agent reviews it before the PR is opened.

The key distinction is this: the prebuilt harness gives the agent general-purpose reliability. The custom harness gives it organizational accountability. Both are necessary, and neither replaces the other.

What a production-ready coding agent harness contains 

A modern, production-ready harness is a layered system of orchestration, verification, memory, guardrails, and observability.

Harness Component What It Is Why It Matters
Tool Orchestration The control plane which determines how an agent selects, chains, and executes tools (APIs, shells) while dynamically recovering from errors. Transforms brittle scripts into resilient, autonomous workflows capable of handling unpredictable real-world environments.
Verification Loops Automated, intermediate quality assurance steps (unit tests, self-critique) evaluated during execution, not just at the end. Fails fast to prevent compounding errors, saving significant cloud compute costs and ensuring higher output accuracy.
Context & Memory Systems that index specific codebases and persist conversational history or customized skills across multiple sessions. Eliminates repetitive prompting overhead and ensures agents strictly adhere to proprietary company design patterns.
Guardrails Hardcoded boundary limits, security sandboxes, budget ceilings, and human-in-the-loop (HITL) approval gates. Mitigates enterprise risk by preventing runaway costs, unauthorized data access, or destructive infrastructure actions.
Observability Comprehensive telemetry, execution tracing, and audit logs capturing the exact inputs, outputs, and state of the agent. Unboxes AI decision-making, allowing engineering teams to debug failures, run regressions, and prove system reliability.
Core harness engineering components and their role in reliable agentic systems

Tool orchestration and verification

Tool orchestration is the central nervous system that transforms an AI model from a passive text generator into an autonomous actor capable of executing complex, multi-step workflows. It dictates how the agent accesses environments, like secure file systems, shells, or internal APIs, and how intelligently it can chain these utilities together to solve problems. Crucially, robust orchestration includes dynamic error handling, allowing the agent to recognize when something went wrong, pivot its strategy, and recover without requiring human intervention. This resilience is what separates a brittle proof-of-concept from a production-ready agent that can navigate unpredictable external systems.

Verification loops

While tool orchestration ensures the tools run correctly, verification loops act as an automated quality assurance layer that evaluates the accuracy and logic of the agent’s intermediate work. By integrating unit tests, linters, and self-critique after individual steps, these loops catch hallucinations and logical flaws immediately rather than at the end of a long run. This fail-fast mechanism prevents minor early-stage mistakes from compounding into large, unrecoverable failures. For engineering teams, this reduces the time and compute wasted on dead-end agent runs while ensuring a higher baseline of output reliability.

Context and memory systems

Context and memory systems give the agent continuity, transforming it from a generic assistant into a specialized extension of your engineering team. By actively indexing your codebase and retaining session history, the agent avoids having to relearn your architecture and constraints every time it’s invoked. This persistent memory allows the agent to adhere to established design patterns and reuse customized skill libraries to solve recurring problems faster. This helps reduce the overhead of repeated context-setting for developers and drive more consistent, domain-specific outcomes.

Guardrails

Guardrails define the safe operating boundaries for an autonomous agent, ensuring it can’t cause unintended infrastructure damage or incur runaway costs. By enforcing strict scope limits, security sandboxes, and hard budget ceilings, engineering leadership can confidently mitigate the risks of autonomous execution. Human-in-the-loop gates for sensitive or irreversible actions—like modifying production databases—ensure that ultimate authority stays with your engineers. These mechanisms are non-negotiable prerequisites for building organizational trust and moving agents out of testing into real-world environments.

Observability

Observability provides the telemetry required to unpack the black box of AI decision-making, letting your team track exactly what an agent did and why. Through execution tracing and detailed audit logs, engineers can debug failed agent runs by reviewing the agent's exact tool inputs and environmental state at any given moment. This infrastructure also powers systematic evaluations and regression detection, giving concrete data on whether recent changes to the prompt or harness actually improved the agent's success rate. In short, observability turns anecdotal agent behavior into quantifiable, actionable metrics.

What engineering leaders should measure as harness engineering evolves

Measure agent reliability, cost, and human-system quality in stages: Start with what you can pull from existing systems, then build the session-to-PR linkage that unlocks the rest.

As AI agents take on more engineering work, the question leaders need to answer is whether the model-harness-human dynamics are producing strong, safe code at a reasonable cost. A quick definition before the metrics: In this section, a task is a piece of engineering work that ends in something a person can review—usually a pull request PR or a closed ticket. Individual chats with the agent and tool calls are the raw material; tasks are what gets shipped. Using the same definition of task everywhere keeps success and failure rates comparable across teams.

The most useful way to think about harness engineering metrics is in stages, ordered by the data you actually have access to. Start with metrics you can pull from your existing systems. Next, add metrics that need new tracking infrastructure as you build it. Save the metrics that need surveys or detailed categorization for last.

A staged plan for measuring agent work

Rolling out metrics in the right order is what makes a measurement program more feasible. Match each stage to the data your team can actually collect.

Stage 1 — What you can measure right now

These metrics use data your engineering team already has: PR cycle times, AI vendor bills, headcount, git history. They give you a baseline on cost and pipeline impact using systems you already run.

Stage 2 — Once you can link agent sessions to PRs

This is the hardest piece of infrastructure to build, and it’s also the most valuable. You connect each agent session to the PR it created, label the session by what the engineer was trying to do (build something, explore an idea, ask a question), and trace bugs and incidents back to specific agent-assisted PRs. Once that’s in place, you can calculate how often the agent gets things right on the first try, how much of its code stays in production, and how often its work causes problems downstream.

Stage 3 — Once you can categorize tasks or run surveys

These metrics need either a system for classifying tasks by complexity, or a regular survey of engineers. Treat survey-based metrics as cultural signals; they tell you how the team is feeling about the work. Engineer retention on agent-heavy teams is also worth tracking, though it’s a slow signal that takes a year or more to show patterns.

AI Agent Metric What It Tells You How To Calculate What You Need To Track It
Dollars per merged PR What each shipped PR costs in AI fees Total AI spending ÷ PRs merged in the window AI bill + PR data (have now)
Compute spend per active developer AI cost per engineer Total AI spending ÷ number of active developers AI bill + HR data (have now)
Time-to-merge for agent-assisted PRs How long agent-touched work takes to ship Time from PR open to merge, looking only at AI-touched PRs PR data with an AI-touched flag (have now)
PR size for agent-assisted PRs How big agent PRs are compared to human PRs Average lines changed per AI-touched PR vs. human-authored PRs PR data with an AI-touched flag (have now)
Code churn on agent-touched code How much agent code gets rewritten quickly Lines the agent added that get removed or rewritten within two weeks Git history with an AI-touched flag (have now)
Review velocity relative to PR size Whether your reviewers can keep up with the volume Review time ÷ lines changed, split by AI vs. human PRs PR review data (have now)
First-pass success rate How often the agent solves a real implementation task on the first try Sessions that worked on attempt 1 ÷ all implementation-intent sessions Session-to-PR linking + intent tagging
Agent-PR survival rate How much agent code is still in production a month later Agent-written lines still in main 30 days after merge ÷ original agent-written lines Session-to-PR linking + git history
Defect escape rate on agent-generated changes How often bugs trace back to agent work Incidents tied to agent PRs ÷ total agent PRs Session-to-PR linking + incident tracking
Distribution of task complexity across engineering levels Whether agent work is changing what each engineering level handles Task complexity scores broken down by engineer level over time A way to score task complexity + engineer level data
Reviewer fatigue and confidence in agent PRs How tired or trusting your reviewers feel Short quarterly survey, tracked over time A simple engineer survey each quarter
Harness engineering metrics for tracking AI agent cost, quality, and reviewer impact

Why linking agent sessions to PRs is the foundation

The hardest and most valuable piece of measurement infrastructure is the link between agent sessions and PRs. You connect each session to the PR it produced, label the session by what the engineer was trying to do, and trace bugs and incidents back to specific agent-assisted PRs. With that in place, you can measure how often the agent succeeds on a real task, how much of its code stays in production, and how often its work causes problems.

Engineering leaders investing in agent measurement should build this linking first. Metrics can follow once the linking works.

Where to look when a metric shows a problem

When a metric flags something off, the harness is usually the first place to check. Three common patterns:

  • A task failed → usually a harness problem: the harness gave the agent partial context about your code, skipped a verification step, or routed the agent through a broken tool connection.
  • AI costs are climbing → usually tokens are being wasted through redundant tool calls, repeated context lookups, or evaluations that re-run on every change. Vendor pricing changes is another thing to check.
  • Developers are losing confidence in agent work → usually the harness leaves out the reasoning behind agent changes, so reviewers have to figure out the intent themselves before they can review the code.

When deciding what to change, look at the whole system—the AI model, the harness, and how humans are working with both.

The harness engineering work to prioritize this quarter

Harness engineering is the practice that converts raw model capability into production-grade engineering work. The orchestration, verification, memory, guardrails, and observability built around an AI agent determine whether its output reaches production safely and at scale—and the teams investing in these layers are the ones consistently moving agent-assisted code into real systems. 

A practical move engineering leaders should make this quarter: Start with the Stage 1 metrics covered above. Dollars per merged PR, time-to-merge for agent-assisted PRs, review velocity against PR size, and compute spend per active developer. None of these require new instrumentation. Once you have a baseline, the data will tell you which of the five harness layers needs your next round of engineering investment.

Remember, AI engineering requires more than better tools. Harness engineering is one of the eight pillars that make up this emerging system, and it should be treated as a deliberate, measured practice in 2026 and beyond.

Faros is the system for running engineering with AI. We give engineering leaders visibility into how work operates across code, people, and systems—plus control over how that work progresses through enforceable workflows and policy. This enables organizations to deploy AI effectively and improve engineering throughput with stronger cost efficiency. Request a demo to see what Faros can do for you.

Neely Dunlap

Neely Dunlap

Neely Dunlap is a content strategist at Faros who writes about AI and software engineering.

AI Is Everywhere. Impact Isn’t.
75% of engineers use AI tools—yet most organizations see no measurable performance gains.

Read the report to uncover what’s holding teams back—and how to fix it fast.
Cover of Faros AI report titled "The AI Productivity Paradox" on AI coding assistants and developer productivity.
Discover the Engineering Productivity Handbook
How to build a high-impact program that drives real results.

What to measure and why it matters.

And the 5 critical practices that turn data into impact.
Cover of "The Engineering Productivity Handbook" featuring white arrows on a red background, symbolizing growth and improvement.
Graduation cap with a tassel over a dark gradient background.
AI ENGINEERING REPORT 2026
The Acceleration 
Whiplash
The definitive data on AI's engineering impact. What's working, what's breaking, and what leaders need to do next.
  • Engineering throughput is up
  • Bugs, incidents, and rework are rising faster
  • Two years of data from 22,000 developers across 4,000 teams
Blog
8
MIN READ

AI tokenomics: How to manage AI token spend in engineering

Enterprise AI token spend is surging. Learn how AI tokenomics and token intelligence help engineering leaders track, forecast, and control AI costs.

Blog
8
MIN READ

What engineering leaders need to know about Claude Opus 4.8

Claude Opus 4.8 hits 88.6% on SWE-bench and 0% hallucination rate on flawed data. See what else is new across agentic SWE performance, prompt injection resistance, tool use improvements, and evaluation awareness risks.

Blog
9
MIN READ

The hidden cost of AI code quality: Why senior engineers are paying the price

AI-generated code looks clean but fails beneath the surface. See what the data says about AI code quality, review burden, and how to fix it at the source.