← All insights
ComplianceMay 13, 20267 min read

EU AI Act Article 12: Logging Your AI Agents Properly

Share
Article cover placeholder

TL;DR

EU AI Act Article 12 requires automatic event logs across the lifecycle of every high-risk AI system, with at least six months retention. This article shows compliance teams and engineers exactly what to capture per agent turn, how long to keep it, and how to build the audit export before the first regulator inquiry arrives.

Most teams reading the EU AI Act for the first time fixate on the headline obligations. Risk classification. Conformity assessment. Human oversight. Those matter. But the obligation that breaks more deployments than any other is the quiet one buried in Article 12: keep automatic event logs for the full lifecycle of every high-risk AI system you deploy.

If you cannot reconstruct what your agent did on a Tuesday afternoon in March, you do not have a compliance gap. You have a regulatory exposure.

What Article 12 actually requires

The text is precise. High-risk AI systems must be designed with automatic recording of events while they operate. The logs must let you trace inputs, outputs, decisions, and material state changes over time. Recordings have to be reliable enough to support post-market monitoring, identify situations where the system could create a risk, and reconstruct the operation of the system for inspection by authorities.

In plain terms: if an agent denies a loan application, a regulator must be able to see the prompt, the retrieved context, the model used, the tools called, and the resulting decision. Without that chain, the deployment is non-conformant.

The retention requirement is at least six months. Most legal teams settle on 24 months because that aligns with GDPR's storage limitation principle and the AI Act's broader supervisory window. A few sectors push to 60 months for safety reasons, particularly in healthcare and critical infrastructure.

The cost of doing it badly

Most in-house implementations get logging wrong in the same way. Engineers wire a logger.info call into the model gateway, dump JSON into Cloudwatch or Elastic, and call it done. Then the first regulator inquiry arrives, and three problems surface.

The first is granularity. Application logs at INFO level capture HTTP requests, not AI decisions. The retrieved knowledge chunks are not there. The tool argument values are not there. The model version is not there. The audit team needs every input that influenced an output, and a raw application log will not produce it.

The second is durability. Most observability stacks rotate logs aggressively — 30 days at INFO, sometimes shorter. A retention window of two years requires a deliberate archive pipeline, not a default S3 lifecycle policy on application logs. We have seen teams discover this six weeks before a Dutch DPA audit, with no usable record beyond the last month.

The third is searchability. Even when logs exist, finding the right ones for a specific decision takes hours. The regulator does not care about your full-text search latency. They want a clean export for case ID 4217 in the first email.

What a compliant log capture looks like

A system that meets Article 12 without surprise gaps captures four classes of event per agent turn:

First, the conversation event. Who initiated the turn, in which workspace, against which agent, at what timestamp, with which user roles in scope. This anchors every downstream record to a single trail.

Second, the model invocation. Which model identifier (claude-opus-4-7, gpt-4o, gemini-2.5-pro), which prompt template version, the full prompt text after PII redaction, the temperature and other inference parameters, and the raw model response. Hashing the prompt alone is not enough — the regulator needs to see what was sent.

Third, the tool calls. Every typed tool invocation with its arguments, its source connector, its response status, and any external service trace ID. A Slack post needs the channel and message ID. A NetSuite invoice post needs the journal entry reference.

Fourth, the decision artifact. The final output the user saw, any human-in-the-loop approval that gated it, the cost in euros, and the risk class of the agent that produced it.

A good rule: if a regulator asked for the exact moment your AI agent decided X, you should be able to email them a single CSV row referencing every other log in the bundle.

This is what AgentWorks writes by default. Every chat turn, every agent run, and every multi-agent pipeline step is committed to an append-only audit log with a signed timestamp, retained for 24 months on EU storage, and exportable to CSV with one click.

Practical applications and ROI

Most compliance investments feel like sunk cost. Logging done well is the exception — it has visible operational value beyond the audit cycle.

Use of the logOperational benefitIndicative ROI
Investigating an agent that returned wrong invoicesReplay the exact retrieval + tool calls; fix in hours not days20-40 engineering hours saved per incident
Disproving a customer complaint of biasShow the prompt + retrieved context + model + decision chainOne escalation deflected covers a month of platform cost
Auditing token spend per agent for budget reviewsSum the recorded cost rows by tagProcurement decisions backed by reality, not estimates
Demonstrating EU AI Act compliance to a procurement buyerWalk through a live audit export in a sales callFaster deal close, smaller security questionnaire round

The insurance industry has already started writing AI E&O clauses that reference Article 12 logging directly. Without proper logs, your insurer can deny coverage on an AI-driven loss. With them, your premium is materially lower.

How to get started

  1. Catalogue every AI surface your organisation operates. Internal copilots count. Vendor SaaS that wraps an LLM counts. Anything autonomous enough to make a decision touches the obligation.

  2. For each surface, write down the four event classes above. If you cannot answer all four for a typical turn, you have a logging gap.

  3. Pick a retention floor of 24 months and pin it. Put the policy in writing. Make sure the storage is EU-resident if your tenant is.

  4. Build the audit export early, not in the week of the inquiry. A regulator asks for events between two dates filtered by agent or user. Your stack should answer in under one minute. Test the export quarterly with synthetic inquiries.

If you are still in design phase, build the logging plumbing before the first deployment. Retrofitting logs onto an already-running agent stack is more painful than starting clean.

If your AI surface is already live and logs are thin, the fastest path to compliance is moving the model gateway to a platform that handles the event capture at the protocol layer rather than the application layer. AgentWorks does this by writing every model and tool event to an immutable audit store regardless of how the agent itself was built.

Common implementation mistakes to avoid

We have reviewed roughly forty deployments over the past year and the same five mistakes recur.

The first is treating PII redaction and audit logging as the same pipeline. They are not. Redaction protects the model. Logging protects the regulator. The audit log must capture the original prompt as it was reviewed and sent — masked, but not lossy. Hashing the prompt before storing makes the trail useless when the inquiry arrives, because no one can reconstruct what the system actually saw.

The second is logging at the wrong layer. Application-level loggers miss model-level events such as token counts, retry attempts, and tool argument resolution. Capture at the gateway, where every input and output naturally flows, and you get the full record without instrumenting every business endpoint.

The third is signing too late. A regulator will challenge log integrity. If your records are not signed at write time with a tamper-evident scheme — Merkle-tree append-only logs, signed timestamps from a trusted authority — you cannot prove the entries have not been altered. Retrofitting integrity onto a year of logs is essentially impossible.

The fourth is forgetting agent versioning. When an agent is updated — a new system prompt, a new tool, a different model — your log needs a reference to the agent version that produced each entry. Otherwise an investigation cannot tell whether a problem belongs to v1.4 or v1.7. Treat agent versions like deployable releases and log the semver.

The fifth is ignoring human-in-the-loop events. Approvals, rejections, and overrides are themselves Article 12 events. Without them, the chain of accountability breaks at the most important step — the moment a human chose to let the agent act or refused. Log every approval queue interaction with the approver identity, decision, and reason.

What changes on 2 August 2026

The bulk of high-risk system obligations apply from 2 August 2026, with full applicability by 2 August 2027. Two practical consequences follow.

Provider notifications start in early 2026. If you provide an AI system that another company deploys, the deployer will start asking for compliance documentation. Article 12 logging is part of the technical documentation expected under Annex IV.

Procurement questionnaires get more specific. Buyers will ask whether your logs are signed, retained EU-resident, and exportable. Vague answers will start losing deals. Concrete answers (yes, immutable, 24-month retention, CSV export, signed timestamps) will start winning them.

If you start logging properly today, you will be ahead of every competitor that postpones the work until the obligation is enforced. That is not just compliance hygiene; it is sales enablement.

Not sure where your current logging stands against Article 12? Request a tailored compliance-ready roadmap at agent-works.ai/contact.

About the author

· Founder, AgentWorks

Erwin Berkouwer is the founder of AgentWorks — an AI agent platform purpose-built for European teams that need EU AI Act-ready governance, multi-LLM choice across OpenAI, Anthropic, Google and Mistral, and transparent per-token € pricing.

Read more about Erwin