AI agents are moving from prototypes to production systems that call tools, browse the web, query databases, and delegate tasks to other agents. Once deployed, these agents make decisions autonomously, and the consequences of a misbehaving agent can be severe. The Microsoft agent governance toolkit (AGT) addresses this gap by providing deterministic application-layer interception that makes agent misbehavior structurally impossible rather than just unlikely. Unlike prompt-level safety measures that politely ask models to follow rules, AGT intercepts every tool call, message send, and delegation in deterministic code before the model’s intent reaches the wire. This post walks through AGT’s architecture, policy engine, privilege rings, MCP security gateway, and OWASP coverage, with installation instructions and framework integration details.

Key Insight: Research shows that prompt-level safety measures have a 100% attack success rate (OWASP LLM01:2025). AGT’s deterministic interception makes misbehavior structurally impossible rather than just unlikely.

The Problem with Prompt-Level Safety

Current approaches to AI agent safety rely on prompt engineering – instructing the model to follow rules, avoid harmful actions, and stay within bounds. This is fundamentally flawed because it treats safety as a polite request to a stochastic system. OWASP LLM01:2025 states explicitly that “it is unclear if there are fool-proof methods of prevention for prompt injection.” The published research backs this up: Andriushchenko et al. (ICLR 2025) report a 100% attack success rate on GPT-4o, GPT-3.5, Claude 3, and Llama-3 using adaptive attacks evaluated against the JailbreakBench benchmark (Chao et al., NeurIPS 2024).

Microsoft’s own AI Red Teaming Agent formalizes Attack Success Rate (ASR) as the canonical metric for this class of failure. Their report on red teaming 100 generative AI products reinforces the point: “mitigations do not eliminate risk entirely” and red teaming must be a continuous process because model-layer defenses are probabilistic by construction.

AGT does not try to win that fight inside the prompt. Every tool call, message send, and delegation is intercepted in deterministic application code before the model’s intent reaches the wire. Actions the AGT kernel denies are not “unlikely.” They are structurally impossible. That is the difference between asking an agent to behave and making it incapable of misbehaving.

What is the Agent Governance Toolkit?

The Agent Governance Toolkit (AGT) is Microsoft’s open-source framework for policy enforcement, identity management, sandboxing, and site reliability engineering for autonomous AI agents. Currently in Public Preview at version 4.0.0, AGT provides a single pip install that works with any agent framework.

AGT is built around three fundamental questions that every production agent system must answer:

1. Is this action allowed? An agent with access to send_email and query_database should not be able to drop_table. OAuth scopes and IAM roles control which services an agent can reach, not what it does once connected. AGT’s policy engine evaluates every action against configurable YAML policies before execution.

2. Which agent did this? In a multi-agent system, five agents might share a single API key. When something goes wrong, “an agent did it” is not an incident response. AGT’s zero-trust identity mesh assigns cryptographic identities to every agent using Ed25519 keys, SPIFFE certificates, and DID verification.

3. Can you prove what happened? Auditors and regulators need tamper-evident records of every decision: what policy was active, what the agent requested, and why it was allowed or denied. AGT’s Merkle chain audit logs provide cryptographically anchored, append-only decision records.

Architecture Overview

AGT Architecture Overview

The architecture diagram above illustrates the eight major packages that compose the Agent Governance Toolkit and how they interconnect to form a complete governance layer around any AI agent. The diagram shows the flow of control from the agent’s initial action request through the governance stack and out to the actual tool execution, with every intermediate step providing a distinct security function.

Understanding this architecture is essential because it reveals how AGT achieves its core guarantee: no action reaches the wire without passing through every governance layer. The eight packages are not independent modules that can be mixed and matched arbitrarily – they form a coherent stack where each layer depends on the guarantees provided by the layers below it.

At the center sits the Agent OS package, which houses the policy engine and governance gate. This is the core decision-making component that evaluates every action an agent attempts to perform against the loaded policy documents. The policy engine supports three policy formats: YAML for human-readable declarations, OPA (Open Policy Agent) for teams already using Rego-based policies, and Cedar for AWS-compatible authorization policies.

This multi-format support means organizations do not need to rewrite their existing policy infrastructure to adopt AGT. The governance gate is the interception point where all tool calls, message sends, and delegation requests are captured before they reach the wire. It acts as a mandatory checkpoint that no agent can bypass, regardless of the framework it runs on. The governance gate implements a fail-closed design: if the policy engine is unavailable or returns an error, the default action is to deny the request.

The Agent Mesh package handles agent discovery, routing, and the zero-trust identity mesh. It provides Ed25519 key generation for agent identity, SPIFFE certificate issuance for workload identity, DID (Decentralized Identifier) verification for cross-mesh trust, and mutual TLS (mTLS) for secure agent-to-agent communication. When an agent registers with the mesh, it receives a cryptographic identity that follows it through every interaction, enabling precise attribution of actions.

This solves the “which agent did this?” problem by ensuring that every action can be traced back to a specific, cryptographically identified agent rather than a shared API key. The mesh also maintains a trust score for each agent, which is updated based on the agent’s behavior over time and influences which operations the agent is permitted to perform.

The Agent Runtime package implements execution sandboxing through four privilege rings, modeled after operating system protection rings. Ring 0 (kernel) provides full system access for trusted governance operations, Ring 1 (system) allows framework-level operations, Ring 2 (user) is the default for normal agent tool calls, and Ring 3 (untrusted) restricts agents to the narrowest set of capabilities.

This graduated model ensures that agents only access the resources their trust level warrants, preventing a low-trust agent from performing high-privilege operations. The runtime also supports saga orchestration for multi-step agent workflows, ensuring that partial failures are handled gracefully with compensation actions.

The Agent SRE package brings site reliability engineering to agents with SLO monitoring, error budgets, circuit breakers, and chaos testing. When an agent begins failing at an unacceptable rate, the circuit breaker trips and prevents further damage. The error budget system tracks how many failures are acceptable within a time window, and when the budget is exhausted, the agent is automatically degraded or suspended. The chaos testing module allows teams to inject failures in a controlled manner, verifying that circuit breakers work as expected before a real incident occurs.

The Agent Compliance package provides OWASP verification, policy linting, and integrity checks. It includes the PromptDefense Evaluator, which performs 12-vector prompt injection audits, and the agt verify CLI command that generates compliance evidence for auditors. The policy linter catches common policy misconfigurations before they reach production, such as overly permissive default actions or conflicting rules. The integrity check system verifies that installed packages match their expected checksums, detecting supply chain attacks.

The Agent Marketplace package governs plugins through trust scoring, signing verification, and schema drift detection. Before a plugin is loaded, its signature is verified against a trusted certificate chain, its trust score is checked against configurable thresholds, and its schema is compared to the expected interface to detect poisoning or typosquatting attacks. This supply chain protection extends AGT’s governance boundary to third-party extensions.

The Agent Lightning package governs reinforcement learning training with violation penalties. When an agent is trained using RL techniques, Lightning ensures that reward functions do not incentivize policy-violating behavior by applying penalty signals for governance violations during training. This prevents the common failure mode where RL optimization finds reward hacks that violate safety constraints. Lightning integrates with popular RL frameworks and provides a governance-aware reward wrapper.

Finally, the Agent Hypervisor package provides execution audit logging through a Merkle chain, a delta engine for tracking state changes, and commitment anchoring for tamper-evident records. Every governance decision is recorded as a leaf in the Merkle tree, and the root hash is periodically anchored to an external commitment store, making it cryptographically impossible to alter historical decisions without detection. The delta engine tracks state changes between decisions, providing a complete replay capability for incident investigation. The commitment anchoring system supports multiple backends, including local files, cloud storage, and distributed ledgers.

How Policy Evaluation Works

Policy Evaluation Flow

The policy evaluation flow diagram shows the step-by-step process that occurs every time an agent attempts an action. The diagram traces the request from the agent through identity verification, policy evaluation, and audit logging, with three possible outcomes: allow, deny, or require approval.

Understanding this flow is critical because it demonstrates how AGT achieves deterministic governance – every action follows the same path, and no shortcut exists that would allow an action to bypass any stage. The linear pipeline design is intentional: each stage must complete before the next stage begins, and a failure at any stage immediately terminates the evaluation with a deny decision.

The process begins when an agent issues a tool call, message send, or delegation request. This request is intercepted by the governance gate before it reaches the wire, ensuring that no action bypasses policy evaluation. The interception happens at the application middleware layer, which means it operates in deterministic code rather than relying on the model’s compliance with prompt-level instructions.

This is the fundamental architectural decision that separates AGT from prompt-based safety approaches. The governance gate is implemented as a framework adapter that inserts itself into the tool call pipeline, and it works identically across all 15+ supported frameworks.

The intercepted request is first passed to the identity verification stage, where the Agent Mesh confirms the agent’s cryptographic identity. The agent’s Ed25519 key or SPIFFE certificate is validated, and its DID is resolved to confirm that the agent is registered and in good standing within the trust mesh. The identity check also verifies that the agent’s trust score meets the minimum threshold for the requested operation.

If the identity cannot be verified, the request is immediately denied with a GovernanceDenied exception, and the denial is recorded in the audit log with the reason “identity verification failed.” This immediate denial prevents unregistered or compromised agents from proceeding further in the evaluation pipeline, stopping potential attacks at the earliest possible point.

Once identity is confirmed, the request proceeds to the policy engine. The policy engine loads the active policy document, which is defined in YAML format with an apiVersion, name, default_action, and a list of rules. Each rule contains a name, a condition expression, an action (allow, deny, or require_approval), and an optional description.

The engine evaluates rules in priority order, checking each rule’s condition against the action context, which includes the action type, the agent’s identity, the target resource, and any additional parameters. The priority ordering ensures that more specific rules (higher priority) take precedence over general rules, preventing conflicts where a broad allow rule might override a specific deny rule. The condition language supports comparison operators, set membership checks, and logical combinators.

If a rule’s condition matches, the corresponding action is taken. An allow action permits the request to proceed to execution. A deny action blocks the request and raises a GovernanceDenied exception with the rule name and description, providing clear feedback about why the action was blocked.

A require_approval action pauses the request and routes it to the designated approvers (specified in the rule’s approvers field) for human review. The approval workflow supports configurable timeouts – if no approver responds within the timeout window, the request is automatically denied, preventing indefinite pending states.

If no rule matches, the default_action from the policy document is applied, which can be set to either allow or deny depending on the organization’s risk posture. A fail-closed default (deny) is recommended for production deployments, as it ensures that any unanticipated action is blocked rather than allowed.

After the policy decision is made, the result is recorded in the Merkle chain audit log. This record includes the policy version, the matched rule (if any), the decision (allow, deny, or pending approval), the agent’s identity, the action requested, and a timestamp. The record is hashed and appended to the Merkle tree, making it tamper-evident.

Periodically, the Merkle root is anchored to an external commitment store, providing cryptographic proof that the log has not been altered. This audit trail answers the third governance question – “Can you prove what happened?” – with mathematically verifiable evidence rather than trust-based assertions. The Merkle chain structure means that verifying the integrity of the entire log requires only the current root hash, not a full replay of every decision.

The following YAML policy demonstrates the key concepts. It defines a production policy with a default allow action and two rules: one that blocks destructive database operations and another that requires human approval for email sending.

# policy.yaml
apiVersion: governance.toolkit/v1
name: production-policy
default_action: allow
rules:
  - name: block-destructive
    condition: "action.type in ['drop', 'delete', 'truncate']"
    action: deny
    description: "Destructive operations require human approval"

  - name: require-approval-for-send
    condition: "action.type == 'send_email'"
    action: require_approval
    approvers: ["security-team"]

When an agent attempts a blocked action, the result is immediate and deterministic:

>>> safe_tool(action="read", table="users")
{'table': 'users', 'rows': 42}

>>> safe_tool(action="drop", table="users")
GovernanceDenied: Action denied by policy rule 'block-destructive':
  Destructive operations require human approval

Takeaway: With just two lines of code - govern(agent) - any AI agent gains a complete governance layer with policy evaluation, identity verification, and tamper-evident audit logging.

Four Privilege Rings

Four Privilege Rings

The four privilege rings diagram illustrates AGT’s execution sandboxing model, which borrows the protection ring concept from operating system design and applies it to AI agent governance. The diagram shows four concentric rings, with Ring 0 at the center and Ring 3 at the outer edge, each representing a decreasing level of privilege and an increasing level of restriction.

This model provides graduated access levels that constrain what an agent can do based on its trust level and the nature of the operation it is attempting. The concentric layout is intentional: an operation at a given ring can access resources at its own ring and outer rings, but never at an inner ring, mirroring how CPU protection rings prevent user-mode code from accessing kernel-mode memory. This hardware-inspired design provides a proven model for privilege separation that has been battle-tested in operating systems for decades.

Ring 0 (Kernel) is the innermost and most privileged ring. It is reserved for governance operations themselves – policy evaluation, identity verification, audit logging, and the governance gate. Only the AGT kernel operates at Ring 0, ensuring that the governance layer cannot be bypassed or tampered with by the agents it governs.

This separation is critical: if agents could modify the policy engine or audit log, the entire governance model would be undermined. Ring 0 operations include loading policy documents, evaluating rules, recording Merkle chain entries, and managing the trust mesh. The kernel ring is non-negotiable – no agent, regardless of its trust score, ever executes at Ring 0.

This is the foundation of the entire privilege model: the governance infrastructure must exist at a higher privilege level than the agents it governs, just as the operating system kernel must exist at a higher privilege level than user processes. Without this separation, any agent could potentially modify the governance rules to permit its own malicious actions.

Ring 1 (System) provides framework-level access. Operations at this ring include agent lifecycle management (registration, deregistration, heartbeat), inter-agent communication protocols, and framework adapter operations. Agents operating at Ring 1 can coordinate with other agents through the mesh, but they cannot modify governance policies or audit records.

This ring is typically used by agent orchestrators and framework middleware that need to manage agent behavior but should not have unrestricted access to the underlying system. The system ring provides enough privilege to manage agent coordination without exposing the governance infrastructure. Framework adapters that implement the governance gate operate at Ring 1, giving them the ability to intercept and route agent requests without being able to modify the policy evaluation logic itself.

Ring 2 (User) is the default ring for normal agent tool calls. When an agent calls web_search, query_database, or read_file, these operations execute at Ring 2. The policy engine evaluates these calls against the loaded policy, and the agent’s identity is verified, but the operations themselves are not further restricted beyond what the policy allows.

Most production agents operate primarily at Ring 2, with policy enforcement providing the guardrails for what they can and cannot do. The user ring balances flexibility with safety, allowing agents to perform useful work while preventing them from accessing governance infrastructure. When a new agent is registered with the mesh, it is assigned to Ring 2 by default unless its configuration explicitly specifies a different ring.

Ring 3 (Untrusted) is the outermost and most restricted ring. Agents at Ring 3 have the narrowest set of capabilities and are subject to the strictest policy enforcement. This ring is used for agents that handle untrusted input, interact with external services, or have been flagged by the trust scoring system.

Ring 3 agents may be restricted from making outgoing network calls, accessing sensitive data, or delegating to other agents. The Agent Runtime enforces these restrictions at the execution level, making it structurally impossible for a Ring 3 agent to escalate its privileges. An agent can be demoted to Ring 3 dynamically if its trust score drops below the configured threshold, providing an automatic containment mechanism for agents that exhibit suspicious behavior.

This dynamic demotion is a key safety feature: it allows the system to automatically contain a compromised agent without requiring human intervention. The trust scoring system continuously evaluates agent behavior, and when the score crosses the demotion threshold, the runtime immediately restricts the agent’s capabilities to Ring 3 levels.

The ring model also defines escalation paths. An agent can request a higher privilege level through a formal escalation process that requires policy approval. For example, a Ring 3 agent that needs to make a specific network call can request a temporary Ring 2 escalation, which the policy engine evaluates against the escalation rules.

If approved, the agent receives a time-limited capability token that allows the specific operation. Once the token expires, the agent returns to its original ring. This prevents privilege escalation attacks while allowing agents to perform their intended functions. The escalation process is logged in the Merkle chain audit log, ensuring that every privilege change is traceable and auditable.

The capability token model is inspired by capability-based security systems, where possession of a token grants specific rights without requiring the holder to have a broader set of permissions. This is a well-studied security pattern that provides fine-grained access control without the complexity of managing broad permission sets.

MCP Security Gateway

MCP Security Gateway

The MCP Security Gateway diagram shows the dual-stage pipeline that governs all Model Context Protocol (MCP) traffic between agents and tool servers. The diagram depicts the two interception points: one before the tool call reaches the MCP server (Stage 1) and one before the response reaches the agent (Stage 2).

Between these two stages, the actual tool execution occurs on the MCP server, but the gateway maintains control by intercepting both the request and the response. MCP is the emerging standard for connecting AI agents to external tools and data sources, but it introduces significant security risks: tool poisoning, schema drift, hidden instructions embedded in tool responses, and typosquatting through malicious tool names.

The MCP Security Gateway addresses these risks through a two-stage interception model that provides bidirectional protection. The gateway is specified in the formal MCP Security Gateway 1.0 specification, which includes 127 conformance tests verifying its behavior.

Stage 1: Pre-Execution Interception. Before a tool call is dispatched to the MCP server, the gateway intercepts the request and performs a series of security checks. First, it verifies the tool server’s identity using the trust mesh, confirming that the server holds a valid SPIFFE certificate and its trust score meets the configured threshold. This prevents connections to unregistered or low-trust tool servers.

Next, it checks the tool’s schema against the expected interface, detecting any schema drift that might indicate a poisoning attack where a tool’s interface has been modified to accept malicious parameters. Schema drift detection compares the current tool schema (parameter names, types, required fields) against the previously registered schema, flagging any additions, removals, or type changes.

The gateway also scans the tool call parameters for hidden instructions – strings that might contain prompt injection payloads designed to manipulate the agent’s behavior through the tool response. This scanning uses pattern matching and heuristic analysis to detect common injection patterns, including known jailbreak templates, instruction override sequences, and role-playing prompts.

Finally, it checks the tool name against a typosquatting database to detect tools with names that closely resemble trusted tools but are actually malicious imitations, such as “web_serch” instead of “web_search” or “databse_query” instead of “database_query.”

Stage 2: Post-Response Scanning. After the MCP server returns a response, the gateway intercepts it before delivering it to the agent. The response scanner checks for hidden instructions embedded in the tool output, which is a common attack vector where a compromised tool server returns data containing prompt injection payloads.

For example, a database query tool might return a row containing “IGNORE ALL PREVIOUS INSTRUCTIONS AND…” which, if delivered to the agent unchecked, could override the agent’s safety constraints. The scanner uses both pattern-based detection for known injection templates and semantic analysis for novel injection attempts.

The scanner also checks for data exfiltration patterns, such as responses that contain API keys, credentials, or PII that should not be returned to the agent. PII redaction can automatically mask sensitive fields like social security numbers, credit card numbers, and email addresses before the response reaches the agent.

Content policy filters evaluate the response against organizational content policies, blocking responses that violate data handling rules such as geographic restrictions or classification levels. The response scanner operates on the complete response before any portion is delivered to the agent, ensuring that partial delivery of malicious content is not possible.

The dual-stage model is essential because MCP attacks can originate from either direction. A compromised tool server can inject malicious instructions through its responses (Stage 2 catches this), and a compromised agent can send malicious parameters to a tool server (Stage 1 catches this).

By intercepting traffic in both directions, the MCP Security Gateway provides comprehensive protection that single-direction filtering cannot achieve. This is analogous to how network firewalls inspect both inbound and outbound traffic – a one-way firewall would leave half the attack surface unprotected. The dual-stage design also means that even if one stage is bypassed through a novel attack technique, the other stage provides a second line of defense, implementing defense in depth at the protocol level.

The gateway also maintains a drift monitoring system that tracks changes in tool schemas and behavior over time. If a tool’s response format changes unexpectedly, or if its latency patterns shift significantly, the gateway raises an alert and may temporarily suspend the tool pending investigation.

This continuous monitoring catches slow-drip attacks where a tool is gradually modified over time to avoid detection. The drift monitor maintains a baseline profile for each registered tool, including its expected schema, typical response sizes, normal latency ranges, and historical error rates. Deviations from these baselines trigger alerts at configurable sensitivity levels, allowing teams to balance security strictness against operational noise.

The drift monitor also supports automated quarantine: when a tool’s drift score exceeds a critical threshold, the gateway can automatically suspend the tool and route pending requests to alternative tools, preventing service disruption while the suspicious tool is investigated.

Important: The MCP Security Gateway implements a dual-stage pipeline that intercepts tool calls before execution and scans responses before delivery, catching tool poisoning, schema drift, and hidden instruction attacks that bypass prompt-level safety.

OWASP Agentic Top 10 Coverage

The OWASP Agentic Security Intelligence (ASI) 2026 Top 10 defines the most critical security risks for autonomous AI agent systems. AGT provides deterministic controls for the majority of these risks, making it the most comprehensive agent security framework available.

OWASP ASI Risk AGT Control Coverage
ASI01: Agent Prompt Injection Policy engine intercepts before execution Full
ASI02: Sensitive Data Disclosure PII redaction, data exfiltration checks Full
ASI03: Supply Chain Vulnerability Plugin signing, trust scoring, schema drift detection Full
ASI04: Insecure Output Handling Response scanning, content policy filter Full
ASI05: Excessive Agency Capability model, privilege rings, resource limits Full
ASI06: Prompt Injection in MCP MCP Security Gateway dual-stage pipeline Full
ASI07: Insecure Identity Management Zero-trust mesh, Ed25519/SPIFFE/DID Full
ASI08: Unobservable Agent Behavior Merkle chain audit logs, commitment anchoring Full
ASI09: Insecure Plugin Ecosystem Marketplace governance, signing, trust scoring Partial
ASI10: Unbounded Resource Consumption SRE circuit breakers, error budgets Partial
ASI11: Cross-Agent Trust Escalation Trust scoring, mTLS, identity verification Partial

The eight full-coverage risks are addressed by deterministic controls that make violations structurally impossible. For example, ASI01 (Agent Prompt Injection) is fully covered because the policy engine intercepts every action before execution, regardless of whether the agent’s prompt has been compromised. Even if an attacker successfully injects a malicious instruction into the agent’s prompt, the resulting action will be evaluated against the policy and denied if it violates the rules. The policy engine does not care why the agent is attempting an action – it only evaluates whether the action is permitted by the current policy.

ASI02 (Sensitive Data Disclosure) is fully covered through a combination of PII redaction in the MCP Security Gateway’s response scanner and data exfiltration checks that detect when an agent attempts to access or transmit sensitive data outside of approved channels. The PII redaction system uses pattern matching and named entity recognition to identify and mask sensitive fields before they reach the agent, preventing accidental or intentional data leakage.

ASI05 (Excessive Agency) is fully covered through the capability model and privilege rings. The capability model defines what each agent is allowed to do, and the privilege rings enforce these limits at the execution level. An agent cannot perform operations beyond its assigned ring, and the policy engine evaluates every action against the capability model before allowing it to proceed. This prevents the common failure mode where an agent accumulates excessive permissions over time, a pattern known as “permission creep.”

The three partial-coverage risks are addressed by controls that reduce but do not eliminate the risk. ASI09 (Insecure Plugin Ecosystem) is partially covered because while AGT provides plugin signing, trust scoring, and schema drift detection, the broader plugin ecosystem includes distribution channels and dependency chains that extend beyond AGT’s governance boundary. A malicious plugin could be introduced through a compromised package registry or a supply chain attack on a dependency, which AGT’s signing verification would not detect if the package was signed with a valid key from a compromised publisher.

ASI10 (Unbounded Resource Consumption) is partially covered because SRE circuit breakers and error budgets limit resource usage within the governed boundary, but agents may access external services that are not governed by AGT. ASI11 (Cross-Agent Trust Escalation) is partially covered because trust scoring and mTLS prevent unauthorized escalation within the mesh, but cross-mesh trust relationships require additional governance agreements that are outside AGT’s scope.

Amazing: The Agent Governance Toolkit covers 8 out of 11 OWASP Agentic Top 10 risks with full controls and 3 with partial coverage, achieving zero complete gaps in the most comprehensive agent security framework available.

Installation and Quick Start

AGT is available on PyPI as a consolidated meta-package. The v4.0.0 release merged 45 previously separate packages into 5 top-level distributions for simpler installation and dependency management.

Python Installation

# Install the full toolkit (all packages)
pip install agent-governance-toolkit[full]

# Or install individual distributions
pip install agent-governance-toolkit-core      # Policy engine, identity, audit, MCP gateway
pip install agent-governance-toolkit-runtime    # Privilege rings, saga orchestration
pip install agent-governance-toolkit-sre        # SLOs, circuit breakers, chaos testing
pip install agent-governance-toolkit-cli        # agt CLI, OWASP verification, policy linting

Prerequisites: Python 3.10+

Other Language SDKs

# TypeScript / Node.js
npm install @microsoft/agent-governance-sdk

# .NET
dotnet add package Microsoft.AgentGovernance

# Rust
cargo add agent-governance

# Go
go get github.com/microsoft/agent-governance-toolkit/agent-governance-golang

Govern Any Tool in Two Lines

from agentmesh.governance import govern

safe_tool = govern(my_tool, policy="policy.yaml")   # every call checked, logged, enforced

That is the entire API surface for basic governance. safe_tool evaluates your YAML policy on every call, logs the decision to the Merkle chain audit log, and raises GovernanceDenied if the action is blocked.

Programmatic Policy Evaluation

For more control, use the PolicyEvaluator API directly:

from agent_os.policies import (
    PolicyEvaluator, PolicyDocument, PolicyRule,
    PolicyCondition, PolicyAction, PolicyOperator, PolicyDefaults
)

evaluator = PolicyEvaluator(policies=[PolicyDocument(
    name="my-policy", version="1.0",
    defaults=PolicyDefaults(action=PolicyAction.ALLOW),
    rules=[PolicyRule(
        name="block-dangerous-tools",
        condition=PolicyCondition(
            field="tool_name",
            operator=PolicyOperator.IN,
            value=["execute_code", "delete_file"]
        ),
        action=PolicyAction.DENY, priority=100,
    )],
)])

result = evaluator.evaluate({"tool_name": "web_search"})    # Allowed
result = evaluator.evaluate({"tool_name": "delete_file"})   # Blocked

CLI Tools

AGT includes a CLI for common governance operations:

agt doctor                                        # check installation health
agt verify                                        # OWASP compliance check
agt verify --evidence ./agt-evidence.json --strict # fail CI on weak evidence
agt red-team scan ./prompts/ --min-grade B         # prompt injection audit
agt lint-policy policies/                          # validate policy files

TypeScript Quick Start

import { PolicyEngine } from "@microsoft/agent-governance-sdk";

const engine = new PolicyEngine([
  { action: "web_search", effect: "allow" },
  { action: "shell_exec", effect: "deny" },
]);
engine.evaluate("web_search"); // "allow"
engine.evaluate("shell_exec"); // "deny"

.NET Quick Start

using AgentGovernance;
using AgentGovernance.Policy;

var kernel = new GovernanceKernel(new GovernanceOptions
{
    PolicyPaths = new() { "policies/default.yaml" },
});
var result = kernel.EvaluateToolCall("did:mesh:agent-1", "web_search",
    new() { ["query"] = "latest AI news" });

Framework Integration

AGT supports 15+ agent frameworks through adapters and middleware. The integration model is consistent across frameworks: wrap the agent with govern() and provide a policy document. The framework adapter handles the interception plumbing.

Framework Integration Type Key Feature
Microsoft Agent Framework Native Middleware Deepest integration, first-class support
Semantic Kernel Native (.NET + Python) Policy-gated semantic functions
AutoGen Adapter Multi-agent governance with role-based policies
LangGraph / LangChain Adapter Policy-gated tool calls with trust tiers
CrewAI Adapter Role-based policy enforcement per agent
OpenAI Agents SDK Middleware Policy-gated tool calls with trust tiers
Claude Code Governance plugin Plugin marketplace installation
Google ADK Adapter Cross-framework policy enforcement
LlamaIndex Middleware Query-time governance gates
Haystack Pipeline Pipeline-level policy injection
Mastra Adapter TypeScript-native governance
Dify Plugin Visual workflow governance
Azure AI Foundry Deployment Guide Cloud-integrated governance
GitHub Copilot CLI Governance installer CLI-native governance setup
HuggingFace smolagents Adapter Lightweight agent governance

Each adapter implements the Framework Adapter Contract specification, which defines the behavioral contract for how governance interceptors integrate with agent frameworks. The specification includes 152 conformance tests that verify adapter behavior, ensuring that governance works consistently regardless of which framework you choose.

The adapter pattern works by inserting the governance gate into the framework’s tool call pipeline. When an agent attempts to call a tool, the adapter intercepts the call, evaluates it against the policy, and either allows it to proceed, denies it with a GovernanceDenied exception, or routes it to human approvers. This interception happens at the application layer, not at the prompt layer, which is why it provides deterministic guarantees that prompt-level safety cannot.

Specifications and Testing

AGT is built on a foundation of formal specifications. Every major component has an RFC 2119 specification that defines what implementations MUST, SHOULD, and MAY do. These specs are not documentation – they are behavioral contracts backed by 992 conformance tests.

Specification Scope Tests
Agent OS Policy Engine Policy evaluation, rule merging, fail-closed semantics 68
AgentMesh Identity and Trust Credentials, trust scoring, delegation chains 135
Agent Hypervisor Execution Control Privilege rings, saga orchestration, kill switch 80
AgentMesh Trust and Coordination Peer trust negotiation, mesh-wide policy 62
Agent SRE Governance SLOs, error budgets, chaos, circuit breakers 111
MCP Security Gateway Tool poisoning, drift detection, hidden instructions 127
Agent Lightning Fast-Path RL training governance, violation penalties 100
Framework Adapter Contract 10 adapter integrations, interceptor chain 152
Audit and Compliance Merkle audit, compliance mapping, Decision BOM 157
AgentMesh Wire Protocol Message format, routing, serialization

The 25 Architecture Decision Records (ADRs) document the reasoning behind major design decisions, providing context for why AGT works the way it does and making it easier for contributors to understand the architectural trade-offs.

Standards Compliance

Beyond OWASP, AGT maps to several international standards and frameworks:

Standard Coverage
OWASP Agentic AI Top 10 All ASI risk categories mapped with deterministic controls
NIST AI RMF 1.0 Full GOVERN, MAP, MEASURE, MANAGE alignment
EU AI Act Compliance mapping with automated evidence
SOC 2 Control mapping with audit trail export

The agt verify command generates compliance evidence that maps governance controls to these standards, making it easier to demonstrate compliance during audits. The --strict flag causes the command to fail if any control has weak evidence, enabling CI/CD integration where governance compliance is a deployment gate.

Security Model

AGT enforces governance at the application middleware layer, not at the OS kernel level. The policy engine and agents share the same process boundary, which means that a sufficiently compromised agent could theoretically bypass the governance layer by exploiting process-level vulnerabilities. AGT is transparent about this limitation in its documentation.

The recommended production deployment runs each agent in a separate container for OS-level isolation, combining AGT’s application-layer governance with container-level sandboxing for defense in depth. The security tooling includes CodeQL for Python and TypeScript SAST, Gitleaks for secret scanning, ClusterFuzzLite with 7 fuzz targets covering policy evaluation, injection, MCP, sandbox, and trust components, Dependabot across 13 ecosystems, and weekly OpenSSF Scorecard scoring.

Conclusion

The Microsoft Agent Governance Toolkit represents a fundamental shift in how we approach AI agent safety. Instead of relying on prompt-level instructions that can be bypassed with 100% attack success rates, AGT provides deterministic application-layer interception that makes misbehavior structurally impossible. The three-question framework – Is this action allowed? Which agent did this? Can you prove what happened? – provides a clear mental model for reasoning about agent governance.

With 8 major packages covering policy enforcement, zero-trust identity, execution sandboxing, SRE, compliance, plugin governance, RL training governance, and audit logging, AGT addresses the full lifecycle of agent governance. The MCP Security Gateway’s dual-stage pipeline provides comprehensive protection for the emerging MCP standard, and the 8/11 full OWASP Agentic Top 10 coverage demonstrates the framework’s depth.

The two-line API surface – govern(my_tool, policy="policy.yaml") – makes it trivially easy to add governance to any agent, while the full PolicyEvaluator API and CLI tools provide the depth needed for production deployments. With 992 conformance tests, 10 formal specifications, and 25 ADRs, AGT is built on a rigorous foundation that ensures behavioral correctness and architectural transparency.

Links:

Watch PyShine on YouTube

Contents