← All insights
ComplianceMay 13, 20268 min read

EU AI Act Article 12: What Your Platform Must Log

Share
Article cover placeholder

TL;DR

A practical breakdown of Article 12 EU AI Act logging duties, the six fields every high-risk AI run must record, retention requirements, and the platform pattern that makes compliance an engineering problem rather than a policy debate. Written for CTOs and compliance leads at companies deploying AI agents in production.

EU AI Act Article 12: What Your Platform Must Log

Every AI workflow you ship to production is now a regulated event source. Article 12 of the EU AI Act tells you, in plain terms, that high-risk AI systems must record what happens while they run. Not in a marketing deck, in the actual database that a national authority can pull on 72 hours' notice.

Most teams hear this and reach for their existing application logs. That is not enough. Article 12 has specific requirements about content, retention, and verifiability that ordinary structured logs almost never meet. The good news: with the right platform pattern, compliance becomes a switch rather than a six-month project.

The problem: ad-hoc logs do not survive a regulator visit

The Act takes effect in phases. By August 2026, providers of high-risk AI systems must show automatic event records that allow tracing each decision back to its origin. Article 26 then makes deployers (the company using the AI, not just the vendor) keep those logs for at least six months. For credit scoring, recruitment, biometric identification, and critical infrastructure use cases, the retention floor stretches further under sectoral rules.

The penalty for missing this is not small. Article 99 puts the fine for non-compliance with Article 12 at up to 15 million euro or 3% of global annual turnover, whichever is higher. That is a number that focuses minds in the boardroom.

Here is what we see when teams try to retrofit compliance onto an existing AI stack:

  • Application logs are flushed to STDOUT and rotated after 14 days
  • Prompt and response pairs are not stored, or stored only on error
  • Logs lack a stable identifier linking a model output to the user, task, and dataset version
  • Tool calls and retrieval queries are invisible because the orchestration framework does not emit them
  • Time stamps are local server time, not UTC, and not signed

A notified body or data protection authority will ask three questions during an audit. Which model made this decision. What input did it see. Who approved the action. If your logs cannot answer those three questions for a specific case from nine months ago, you are not Article 12 compliant.

Worse, the discovery happens at the worst possible moment. Most teams find out their logs are insufficient during the first real incident, not during a planned audit. A customer disputes a credit decision, a journalist files a request about an automated hiring screen, or an internal whistleblower flags a suspicious output. Suddenly the security team is reconstructing context from Slack threads and CloudWatch fragments while the legal team negotiates a response window. That is the scenario Article 12 is designed to prevent, and the scenario your platform must be architected to handle without heroics.

Article 12 itself is short. Paragraph 2 lists what the logs must enable: identification of situations that may pose a risk, post-market monitoring, and the operation of the AI system in line with its intended purpose. Paragraph 3 adds specific requirements for biometric identification systems (time, reference database, input, identifying staff).

For every other high-risk system, those abstract goals translate to six concrete fields per run:

  1. Run ID and parent run ID: a UUID that links every step of one workflow. Parent ID lets you trace multi-agent calls back to the originating request.
  2. Identity context: which user or service triggered the run, which tenant they belong to, which role they hold. Pseudonymised where the use case allows.
  3. Model and version: provider, model name, exact version string, the routing decision (why this model, not another), and the prompt template ID with version hash.
  4. Input and retrieval context: the user prompt, the system prompt, any retrieved documents with their source IDs, and the parameters (temperature, max tokens, tool choices). PII redacted before storage where applicable.
  5. Output and tool calls: the model response, every tool the agent invoked, the arguments, the tool result, and the final action taken in any downstream system.
  6. Decision metadata: confidence score where available, whether a human reviewed the output, who approved it, timestamp in UTC, and the outcome (success, failure, escalated).

These fields are the minimum. Store them in append-only tables, never in mutable structured logs. Add a SHA-256 hash of each row chained to the previous row in the same run so any tampering is detectable. Keep the cryptographic chain across at least the retention window required by your regulator (six months baseline, three years for high-risk financial use).

Expert tip: never store raw logs in the same database your application writes to. Use a separate logical store with write-only credentials for the agent runtime and read-only credentials for auditors. The blast radius of a compromise stays contained, and you can prove segregation of duties on day one of an audit.

Three details that most teams miss when they roll their own:

First, the retrieval-augmented generation step. If the agent pulled three documents from your knowledge base before answering, the regulator wants to see which three. Source ID, chunk ID, retrieval score. Log those at the same level as the prompt itself.

Second, the routing decision. If your platform routes between Claude, GPT-4o, and Gemini based on cost or capability, the choice itself is regulated metadata. Why did this request go to model X. The answer should sit in the run record, not in a separate routing service that gets rotated out next quarter.

Third, the human override path. Article 14 requires human oversight for high-risk systems. Your logs must show, for every decision, whether oversight was offered, taken, accepted, or rejected. A binary field is not enough. Record the reviewer ID, the time taken, and the resulting action.

Fourth, watch out for the streaming response edge case. If your platform streams tokens back to the user, the final assembled response is what counts for compliance, but the intermediate function calls and reasoning tokens often carry the diagnostic signal. Store both the streamed event log and the final concatenated output. The cost in storage is negligible; the cost of missing it during an incident review is not.

Fifth, link your logs to your dataset versions. If you fine-tuned a model on a customer support corpus version 3.2, the run record must reference that exact corpus version, not just the model. When the regulator asks why the agent gave a specific answer, the chain of evidence runs from output back through prompt, retrieval source, model, training data, and tenant policy. Each link is a foreign key, not a string.

Practical applications and ROI

Proper Article 12 logging is not just an insurance policy. Teams that ship it correctly see hard returns:

Use caseWithout structured loggingWith Article 12 logsBenefit
Internal incident triage4 hours to reconstruct a bad output4 minutes via run ID lookup60x faster root cause analysis
Customer dispute (wrongful denial)Manual review, often no evidenceFull run record with reviewer trailDisputes resolved in 2 days vs 3 weeks
Annual ISO 42001 auditThree-week scramble, external consultantsHalf-day evidence pull from log store90% lower audit prep cost
Model performance drift detectionDiscovered after customer complaintAutomated weekly report from logsCaught 6 weeks earlier on average
New model rollout safetyProduction canary, hope for the bestReplay logged inputs through new model40% fewer regressions reach users

The second row is the one CFOs notice. A single contested high-risk decision (credit, recruitment, insurance) can cost 50,000 euro in legal fees plus regulator scrutiny. Article 12 logs settle most disputes in hours by showing exactly what happened.

For agency clients running white-label AI agents, proper logging also lets you offer a compliance-grade service tier. Charging 30% more for an audit-ready deployment is reasonable once the logs are there, and the marginal cost is near zero if your platform handles it natively.

There is also a hidden engineering benefit. Once your logs contain replayable inputs (prompt, retrieval set, tool catalogue, parameters), you have a free regression suite. Run last week's 5,000 production requests through a candidate model and diff the outputs. That kind of pre-deployment validation used to need a dedicated team. With Article 12 logs in place, it is a 30-line script.

How to get started

Four steps get you from where most teams are today to Article 12 readiness.

Step 1: inventory your current AI systems against the risk tiers. Map every workflow to one of four categories: prohibited, high-risk, limited-risk, minimal-risk. Article 12 logging is mandatory for high-risk. Recommended (and easy when the infrastructure exists) for the rest. Our EU AI Act compliance overview walks through the classification questions for the common business cases.

Step 2: pick a platform that emits the six fields automatically. Building the log pipeline yourself takes a strong team 8 to 12 weeks. AgentWorks emits run records with all six fields out of the box, including the cryptographic chain and the human override metadata. The AI workforce platform page details the audit log architecture.

Step 3: configure retention per workflow. Six months baseline, longer for regulated use cases. Set this once per agent in the policy editor, store keys in your own KMS, and the platform enforces deletion automatically when the window passes.

Step 4: run a tabletop audit before the regulator does. Pick a real run from three months ago. Have the team produce: the input, the model decision, the retrieval sources, the human override status, and the downstream action. If they can do it in under 15 minutes from cold start, you are ready. If not, fix the gaps now.

Teams that follow this sequence reach Article 12 readiness in 3 to 6 weeks, not 6 months. The cost difference shows up directly in budget season.

One more practical note. Article 12 logs frequently contain personal data. That means GDPR Article 5 applies on top of the AI Act. Build a deletion workflow that respects subject access requests without breaking the cryptographic chain. The pattern most teams settle on: cryptographically erase the encryption key for that subject's run records, leaving the hash chain intact but the content unreadable. This satisfies both regulations without forcing a structural redesign every time a request comes in. Document the procedure once, automate it inside the platform, and the legal team can sign off without monthly back-and-forth.

Closing

Article 12 is not the scariest part of the EU AI Act. It is the most concrete. The Act gives you a checklist of what to log, and a deadline by which to log it. The teams that treat this as an engineering problem (six fields, append-only store, deterministic retention) ship compliance in weeks. The teams that treat it as a policy problem are still arguing about scope when the deadline arrives.

Not sure where AI agents fit? Request a tailored compliance-ready roadmap at agent-works.ai/contact.

About the author

· Founder, AgentWorks

Erwin Berkouwer is the founder of AgentWorks — an AI agent platform purpose-built for European teams that need EU AI Act-ready governance, multi-LLM choice across OpenAI, Anthropic, Google and Mistral, and transparent per-token € pricing.

Read more about Erwin