Glossary
What is Prompt governance?
Last updated: 2026-05-26
Definition
Prompt governance is the operational discipline of treating production prompts (system prompts, tool descriptions, eval rubrics) as code: version-controlled, reviewed before merge, tested against fixed cases, and rollback-ready when production behaviour regresses. Without it, prompt changes drift, regressions compound, and the team loses the ability to explain why the agent behaves as it does.
Why Prompt governance matters
Most AI pilots fail in production not because the model is wrong but because prompts evolve unmanageably. Engineers tweak; behaviour regresses; rollbacks become guesswork. The EU AI Act's Article 17 (quality-management system) effectively requires prompt governance for high-risk systems — auditors will ask for prompt change history and impact analysis.
How Prompt governance works
- 1Store every production prompt in git, never in the running agent's database — same review rules as code.
- 2Each prompt change ships with an eval delta: scored against a fixed test set of input → expected-output pairs, so reviewers see whether the change improved or regressed.
- 3Use a staging environment with shadow-mode evaluation before promoting to production — real traffic, no user-facing effect, side-by-side scoring.
- 4Version-tag every production prompt and log the active version with every agent run, so any output can be traced back to the exact prompt that produced it.
- 5Maintain rollback runbooks: who can roll back, how fast, what data to capture for post-mortem.
Examples
- A customer-support team rejects a prompt PR because its eval scores dropped from 87% to 81% on the "tone" rubric, even though it added a useful new instruction.
- A finance agent's wallet-spend doubles overnight; logs show prompt v2.3 was deployed; engineers roll back to v2.2 in 5 minutes and investigate offline.
- An auditor asks "show me every system prompt change in Q1 and the eval impact of each" — the team produces the answer from git history + eval dashboard in an afternoon.
References
Related concepts
AI agent management
AI agent management is the discipline of operating AI agents at scale — covering deployment, role-based access, budget allocation, performance monitoring, audit logging, and lifecycle (retire, refresh, replace). It is to AI agents what fleet management is to vehicles or what DevOps is to software services.
Human-in-the-loop (HITL)
Human-in-the-loop (HITL) is a design pattern where a human reviewer must approve, edit, or veto an AI agent's output before it executes a consequential action. The agent pauses, surfaces what it is about to do, waits for the human, and then proceeds — a deliberate brake to keep autonomy bounded.
Agent observability
Agent observability is the practice of capturing what an AI agent did, why it did it, and how well it did it, in a form that engineers can search and reviewers can audit. It combines three pillars: logs (the steps), traces (the causal chain across LLM calls and tools), and evals (continuous scoring of output quality).
Role-Based Access Control (RBAC)
Role-Based Access Control (RBAC) for AI is a security model that grants permissions to AI agents and AI users based on roles rather than individuals. A "marketing analyst" role can run a defined set of agents, read certain knowledge bases, and call approved tools — and changes to the role propagate to everyone who holds it.
FAQ
Prompt governance — common questions
- How is prompt governance different from prompt engineering?
- Prompt engineering is the craft of writing prompts that work. Prompt governance is the operational layer around it: review, versioning, evaluation, rollback. Engineering produces the prompt; governance makes sure the right prompt is running in production and you can prove it.
- Do small teams need formal prompt governance?
- At 1-2 agents with 1-2 engineers, lightweight discipline (prompts in git + a short eval set) is enough. At 10+ agents or multiple teams, you need formal review, named owners, and shared evals — without these, prompt rot compounds and quality erodes silently.
- How does AgentWorks support prompt governance?
- Every agent's system prompt is versioned in the workspace, changes are reviewable per team, and runs log the active prompt version. Eval scoring is on the roadmap; today most teams pair AgentWorks with an external eval tool and export logs for scoring.