Back to Insights
Framework

The Agentic Operations Roadmap: From Guardrailed Pilots to Autonomous Fleets

Stage 5 is not a destination — it's a journey. A practical roadmap for progressing from your first production agent to a governed fleet of autonomous digital workers.

opsteamAIPublished 27 May 202612 min read

In The AI Value Realisation Pathway, we defined Stage 5 as "Agentic Operations" — the compounding stage where autonomous agents handle multi-step workflows with memory, tool use, and self-correction. But that definition captures the destination, not the journey.

The reality is more nuanced. An organisation that has just deployed its first production agent with heavy human supervision is in Stage 5. So is an organisation running fifty autonomous agents with predictive governance and minimal human intervention. These are radically different operational states, yet both qualify as "agentic operations."

This article provides the roadmap within Stage 5 — the four phases of agentic maturity, the autonomy spectrum that governs what agents can do, and the governance infrastructure that makes scaling possible.

The Problem: Stage 5 Is Not a Single State

Gartner's research warns that applying uniform governance to all AI agents — regardless of their autonomy level and scope — is a root cause of enterprise AI failure. Organisations encounter two failure modes: over-restriction of simple agents (which slows delivery and drives shadow development) or under-restriction of autonomous agents (which increases operational, security, and compliance risk).

The solution is proportional governance — matching control intensity to agent capability and risk. But this requires understanding where you are in the agentic journey and what infrastructure you need to progress.

The Stage 5 Roadmap

Four Phases of Agentic Operations

Tap any phase to explore what it takes to progress

The transition pattern: Phase 1→2 is about proving value with multiple workflows. Phase 2→3 is about scaling governance infrastructure. Phase 3→4 is about enabling continuous improvement. Most organisations stall at Phase 2 — they can run agents but cannot govern them at scale.

Each phase represents a distinct operational state with its own governance model, infrastructure requirements, and risk profile. Progression is not automatic — it requires deliberate investment in the systems that make each phase stable.

Phase 1: Guardrailed Agents

The entry point to Stage 5. You have deployed a single agent — or perhaps two — handling a tightly bounded workflow. Every action requires explicit human approval. The focus is proving value while building trust.

What it looks like in practice:

  • One agent handling one workflow (e.g., customer support triage, document classification)
  • Human-in-the-loop for all write actions
  • Manual review of every agent output before execution
  • Approval gates embedded in the workflow

Governance model: Human-in-the-Loop (HITL) at 100%. Every action that could change state requires human sign-off. This is slow and resource-intensive, but it's the appropriate level of oversight for an unproven system.

Why organisations get stuck here: They prove value but never build the infrastructure to scale. The agent works, but adding a second agent doubles the governance burden. Without tiered controls, scaling is linear — each new agent requires proportionally more human oversight.

To progress: Build agent identity infrastructure, establish basic audit logging, and begin classifying actions by risk level. The goal is to identify which actions can eventually move to automated execution.

Phase 2: Supervised Agents

You now have multiple agents (3–10) operating across several workflows. The breakthrough: tiered autonomy. Low-risk actions execute automatically. Medium and high-risk actions still require human approval.

What it looks like in practice:

  • Multiple agents across 2–4 distinct workflows
  • Tier 1 actions (low risk, reversible) execute automatically
  • Tier 2+ actions escalate for human approval
  • Centralised monitoring surfaces agent activity and exceptions

Governance model: Tiered Human-in-the-Loop. The Singapore Model AI Governance Framework provides a practical blueprint: low-impact agents require minimal oversight beyond baseline observability; medium-impact agents need enhanced monitoring and approval workflows; high-impact agents require rigorous human checkpoints.

The inflection insight: Most organisations have more Tier 1 work than they realise. By classifying actions systematically, they discover that 60–70% of agent actions can safely execute without individual approval. This is where governance overhead begins to drop.

To progress: Build a centralised telemetry system, establish risk classification standards, and document escalation paths. The goal is to prepare for fleet-level governance.

Phase 3: Governed Fleets

The scaling phase. You now operate 10–50 agents across the enterprise. Individual approval is no longer viable — you would need a small army of reviewers. Instead, humans define boundaries and review exceptions.

What it looks like in practice:

  • 10–50 agents deployed across multiple teams
  • Human-on-the-Loop (HOTL) supervision replaces HITL for most actions
  • Policy-driven guardrails enforce limits automatically
  • Circuit breakers halt operations when thresholds are violated
  • Humans review exceptions, audit logs, and aggregate outcomes

Governance model: Human-on-the-Loop. The California Management Review's Agentic Operating Model describes this transition: "Early governance approaches emphasised Human-in-the-Loop controls, requiring manual human approval for critical actions. While effective in low-volume settings, HITL becomes a bottleneck as enterprises execute thousands or millions of agentic actions per hour. Consequently, organisations are shifting toward Human-on-the-Loop supervision, where humans define objectives, constraints, and escalation thresholds, while agents operate independently within those boundaries."

The fleet governance infrastructure:

  • Agent identity registry with lifecycle management
  • Policy engine with runtime enforcement
  • Circuit breakers for threshold violations
  • Observability across the entire agent fleet
  • Exception routing and escalation workflows

To progress: Build continuous evaluation pipelines, establish feedback loops for model improvement, and implement automated governance controls. The goal is to enable self-improvement.

Phase 4: Autonomous Operations

The compounding state. Agents continuously improve through feedback loops. Governance is predictive and automated. Humans focus on strategy, edge cases, and system evolution — not operational oversight.

What it looks like in practice:

  • 50+ agents with multi-agent coordination
  • Self-correction based on performance feedback
  • Predictive governance anticipates issues before they occur
  • Automated rollback and recovery mechanisms
  • Strategic human oversight, not operational

Governance model: Automated governance with predictive controls. At this level, governance is embedded in the system itself. Gartner advises that "because accountability for outcomes remains with the organisation, this level requires the most rigorous governance — including continuous monitoring, enforced guardrails, rapid rollback mechanisms, circuit breakers that halt agent operation on threshold violations, and clear ownership for agent behaviour."

Why this is the compounding state: The system learns from itself. Agent outputs are evaluated, edge cases are captured, models are fine-tuned, workflows are optimised. Value compounds because the architecture is designed for continuous improvement.

The Autonomy Spectrum

Progression through the phases is not just about the number of agents — it's about what those agents are allowed to do. The autonomy spectrum defines four levels of agent capability, each with its own control requirements.

Autonomy Spectrum

What Can Your Agents Do?

Four levels from read-only to fully autonomous

Level 1

Observe

Read-only information retrieval

Agents retrieve and present information but cannot take any action. Humans make all decisions and execute all tasks. Lowest risk, minimal governance required.

Agent Capabilities
  • Search and retrieve documents
  • Aggregate data from multiple sources
  • Generate summaries and reports
  • Surface relevant information
Example Actions
  • Pull customer history before a call
  • Summarise contract terms
  • Find relevant policies
  • Aggregate sales data
Human Role

Humans execute all actions based on agent-provided information

Required Controls
Authentication and access controlUsage loggingData access governance

The key insight: Autonomy level is not fixed — the same agent can operate at different levels depending on action type, risk classification, and organisational trust infrastructure.

The key insight: autonomy level is not fixed per agent. The same agent can operate at different levels depending on the action type. A customer support agent might operate at Level 4 (Act Autonomously) for routine inquiries, Level 3 (Act with Approval) for refund requests, and Level 2 (Advise) for escalation recommendations. This is proportional governance in action.

The Governance Infrastructure Stack

Scaling agentic operations requires four layers of governance infrastructure. Each layer addresses a distinct question, and failure in any one undermines the stability of the entire system.

Governance Infrastructure

The Four-Layer Governance Stack

Each layer addresses a distinct governance question

Layer dependencies: Each layer builds on the one below. Identity enables Control. Control enables Coordination. Coordination enables Governance. Skip a layer and the stack collapses under operational stress.

The dependency chain: Identity enables Control. Without knowing who an agent is and what it can access, you cannot enforce limits. Control enables Coordination. Without runtime enforcement, multi-agent orchestration produces conflicts and cascading failures. Coordination enables Governance. Without orchestration, accountability is impossible.

Skip a layer and the stack collapses under operational stress. This is why organisations that deploy agents without identity infrastructure — treating agents like throwaway scripts rather than governed workers — inevitably hit scaling limits.

HITL to HOTL: The Transition That Scales

The shift from Human-in-the-Loop to Human-on-the-Loop is the critical transition that enables fleet-scale operations. But it's also the transition where most organisations stall.

Human-in-the-Loop (HITL):

  • Human reviews and approves each action before execution
  • Appropriate for high-stakes, low-volume decisions
  • Does not scale — governance overhead grows linearly with agent count
  • Creates bottlenecks as agents wait for approval

Human-on-the-Loop (HOTL):

  • Human defines boundaries, monitors outcomes, reviews exceptions
  • Appropriate for moderate-to-high-volume decisions with defined risk profiles
  • Scales — governance overhead is independent of agent action volume
  • Enables agents to operate at production speed

How to know when you're ready to transition:

  1. You have systematic risk classification for agent actions
  2. You have observability into agent behaviour across the fleet
  3. You have circuit breakers that can halt operations automatically
  4. You have escalation paths that route exceptions to the right humans
  5. You have audit trails that satisfy compliance requirements

The transition is not binary. You move action types from HITL to HOTL as you build confidence in each category. Start with the lowest-risk, highest-volume actions. Measure outcomes. Expand.

Where Are You in Stage 5?

Honest self-assessment is the foundation of effective planning. The diagnostic below maps your current state across the six dimensions that most reliably predict where you are in the agentic operations journey and what investments will enable progression.

Stage 5 Assessment

Where Are You in Agentic Operations?

Question 1 of 60% complete

How many agents do you have in production today?

Your result is a starting point. Most organisations that reach Phase 4 did not arrive there in a single initiative. They progressed incrementally — proving value with each phase, building infrastructure to enable the next, and maintaining the organisational discipline to govern what they deploy.

What It Takes to Move Forward

Each phase transition requires specific investments. The mistake most organisations make is attempting to skip ahead — deploying Phase 3 agent counts without Phase 3 governance infrastructure, or expecting Phase 4 outcomes without Phase 4 feedback loops.

Phase 1 → Phase 2: Risk classification, tiered escalation, centralised telemetry. The goal is to identify which actions can safely execute without individual approval.

Phase 2 → Phase 3: Fleet governance platform, policy engine, circuit breakers, exception routing. The goal is to shift from per-action oversight to boundary-based oversight.

Phase 3 → Phase 4: Continuous evaluation, feedback loops, automated fine-tuning, predictive controls. The goal is to enable the system to improve itself.

This is what we do. We embed inside operational workflows, map where agents can operate end-to-end, build the governance infrastructure appropriate to each phase, and then continue to operate it. The staying is the differentiator — because production agentic AI genuinely requires ongoing operational engineering to sustain and compound its value.

Start with one workflow. Book a conversation to map your current state, identify where you are in Stage 5, and see exactly what infrastructure you need to progress.


Sources

  1. Gartner. Uniform Governance Is a Death Sentence for Enterprise AI Agents. Gartner Newsroom, 2026. gartner.com

  2. California Management Review. Governing the Agentic Enterprise: A New Operating Model for Autonomous AI at Scale. CMR, March 2026. cmr.berkeley.edu

  3. Singapore IMDA. Model AI Governance Framework for Generative AI. Infocomm Media Development Authority, January 2026. imda.gov.sg

  4. Cloud Security Alliance. Agentic AI Governance Maturity Model. CSA Lab Space, 2026. cloudsecurityalliance.org

  5. Cloud Security Alliance. Agent Identity Governance Framework. CSA Lab Space, 2026. cloudsecurityalliance.org

  6. Microsoft. Agentic AI Adoption Maturity Model. Microsoft Learn, 2026. learn.microsoft.com

Start with one workflow.

Map it. Separate predictable from creative. See exactly where AI adds value — and where it doesn't.

Tags:agentic-aigovernanceautonomyfleet-managementhuman-in-the-loopoperating-model