What is model reproducibility?
Reproducibility in Model Risk Management is the process of replicating results by repeatedly running the same algorithm, datasets and attributes. Achieving full reproducibility is essential in the management of model risk. With the introduction of machine learning and AI, reproducing model results has become a challenging task.
The reproducibility challenge in model risk management
It’s no secret that reproducibility is still one of the most demanding tasks that organisations face in validating models. Full reproducibility of model results often has to be left out of scope due to the high level of resources involved. The process is time-consuming and requires both quantitative and technology-related skills. The more diverse the model type inventory of an organisation, the higher the expected effort. This is especially true for machine-learning models involving larger data sets, multiple dependencies and more sophisticated algorithms.
The challenge is, on the one hand, to bring all the necessary elements of the puzzle together, and on the other, to have the appropriate analytics to link all the objects and execute the task over and over.
With this in mind, what can model risk managers do to ensure reproducibility at all times to meet the transparency requirements set by auditors and regulators?
Code sharing is important but it’s not everything
Reproducibility implies more than just being able to share scripts used to build or test models. It is also a tool that helps model validators and developers navigate through and keep track of the building blocks that lead to a specific result.
Isolating model risk drivers is easier said than done. In theory, being able to access and use the same data sets and code developers used during the model development phase should suffice for validators to replicate results. However, in practice, validators need more than that – they need visibility on how data is linked to and used within a given piece of analytics.
In other words, the collaboration between model developers and validators is vital to facilitate the replication of results and minimise operational risks embedded in manual tasks and fragmented set-ups. One step in this direction is the implementation of a well-functioning central repository that tracks and stores complete documentation of models with their interdependencies and linkages, environment information, scripts, versions and data shapes. With a central platform in place, the flow of data and model-related information would be seamless and controlled, with all details stored and managed at the appropriate level of granularity.
Best practices in overcoming reproducibility challenges
There are many ways for validators to make the process of replicating results less taxing and time-consuming. Below we illustrate three best practices that may be adopted to overcome some of the most common reproducibility challenges:
These are just some of the tested and proven practices we have implemented. With Yields for Performance, we empower top-tier banks with a tool that makes model validation ten times more efficient.
Below you see an example of how reproducibility can be achieved with Yields for Performance: an age outlier was present in the original data set, and then corrected in a subsequent execution (second session). Given the ability to historise all sessions, results with the outlier can be reproduced at a later stage by running the exact same script (a simple one that computes basic stats for this example) with the very same data.
Conclusion
Reproducibility is a crucial part of model validation and remains one of the most common challenges in the banking and finance industry, where thousands of models are used on a daily basis for key strategic decision-making. Without replication of model outputs, model developers can’t 100% verify how well models will perform.
Achieving reproducibility in models, especially in machine-learning models, can be difficult and time-consuming but it doesn’t have to be. With the right practices and tools, such as Yields for Performance, replicating model outputs can be achieved with greater efficiency and accuracy.
About the Author
Efrem Bonfiglioli has several years of experience in model risk management. He developed models across a wide range of applications both in the corporate world and for financial services applications. In recent years, Efrem has provided advice on cutting-edge model risk management solutions both within top-tier banks and for his financial services clients across the globe.