Navigating Model Risk in the Age of Agentic AI

A Recap of Lukasz Szpruch’s Keynote at Yields Innovate

March 26, 2026

Watch the full keynote video at the bottom of the article.

On March 12th, Lukasz Szpruch from the University of Edinburgh and The Alan Turing Institute took the stage at Yields Innovate to deliver a keynote on the evolving landscape of artificial intelligence. His presentation, "From AI Assurance to AI Insurance: Model Risk in the Age of Agentic AI," explored the growing gap between AI capabilities and AI reliability, proposing new frameworks for scaling governance in financial institutions.

The Generative AI Reality Check

While executives are highly enthusiastic about the rapid advancement of AI capabilities, Szpruch warned that increased capability does not translate to increased reliability. In fact, the opposite is often true. Deploying these systems for high-risk applications requires significantly more effort in risk management than in building the systems themselves.

Unlike traditional predictive models that have transparent features and understandable probability distributions (what Szpruch described as dealing with "known unknowns"), generative AI systems introduce "unknown unknowns". Generative AI systems are complex, dynamic, and composable, assembled from foundational models, retrieval-augmented generation (RAG) indexes, and prompts. Because their outputs are open-ended, establishing a clear "ground truth" for evaluation is inherently difficult. Highlighting the current challenges of enterprise deployment, Szpruch pointed to a recent report stating that 95% of investments in generative AI have yielded zero returns.

The "Human-in-the-Loop" Bottleneck

Currently, a majority of banks deploy AI for low-risk applications under the assumption that a human reviewer will take ultimate responsibility for the output. However, Szpruch firmly argued that this approach is fundamentally unscalable.

Relying on human intervention defeats the core premise of automation:

Speed Asymmetry: Agentic systems process data and make decisions far faster than a human can review them, meaning oversight either grinds the process to a halt or devolves into blind "rubber-stamping".
Comprehension Asymmetry: Reviewers often lack the ability to fully interrogate the system's high-dimensional reasoning, leading to dangerous automation bias.
Context Asymmetry: While an AI agent operates with a full execution history, the human reviewer usually only sees a partial snapshot of the context.

To safely manage autonomous systems, institutions must implement sophisticated, continuous monitoring tools that automatically flag edge cases for human review.

Scaling Governance: Capabilities Over Use Cases

A major hurdle for Model Risk Management (MRM) teams is how to classify and validate generative AI. If a bank treats every individual AI application, such as drafting credit memos, summarizing KYC documents, or checking validation reports, as a separate model requiring full independent validation, the MRM function would quickly become the largest team in the organization.

Instead, Szpruch proposed shifting focus from "use cases" to a "capability codebook":

A "capability" is a bounded class of actions, such as document retrieval or text summarization.
Each capability has its own specific constraints, permitted tools, and evidence requirements for validation.
Individual use cases are simply orchestrated sequences of these pre-governed capabilities.

By testing capabilities rather than disparate use cases, organizations can reuse calibration data, apply shared evaluation metrics, and more easily isolate the root cause of failures.

Agentic AI and Runtime Governance

As the industry moves toward autonomous agentic workflows, risk management must adapt to monitor the entire execution trajectory rather than just evaluating a final output. Because material failures frequently happen during intermediate processing steps, maintaining "runtime governance" is critical. Organizations need to define playbooks and strict operational policies before deploying agents, ensuring that the AI can only execute capabilities in a highly controlled, compliant manner.

The Future: Accountability Through AI Insurance

Governments are currently excited by the prospect of third-party AI assurance to certify safe models. However, Szpruch cautioned that without accountability, third-party testing can easily become a superficial exercise.

To enforce rigorous standards, he advocates for the development of an AI insurance market. While underwriting AI faces immense hurdles, such as a lack of historical actuarial data, opaque failure modes, and developer secrecy, an insurance market would offer powerful financial incentives. Insurers demanding audit trails and explainability could naturally drive the adoption of best practices, penalizing risky, unverified systems with prohibitive premiums.