Interfaces and Algorithms

February 15, 2020

The word “model” is an oft-used term in data science. We build, train, fit, tune, benchmark, cross-validate, test, score, validate, bootstrap, calibrate, retrain, recalibrate, and monitor them (among other things). But what exactly is a model? It’s a computer program, a strings of ones and zeros, a machine’s approximation to a human’s decision-making process, yes. More abstractly, it is a tool which solves a problem in a certain way and thus consists of two high-level parts: What problem are you trying to solve and how are you trying to solve it? Here at Yields.io, we call the “what” the interface, the “how” the algorithm.

Interface: What problem are you trying to solve? Classification? Regression? Dimensionality Reduction?
Algorithm: How are you trying to solve it? Neural Network? Random Forest? Gradient Boosting?

By answering these two questions, we have a kind of vocabulary for talking about what models are, how they work, how they are expected to behave, and so on. On a more technical note, we also modularize the problem, an important step in the development process. Here is a small sampling of the stock models included in the “marketplace” section of the Yields.io platform, Chiron. (Users also have the ability to create their own custom models using this paradigm.)

Binary Classification by Neural Network
Multi-Classification by XGBoost
Regression by Random Forest
Clustering by K-Means
Clustering by DBSCAN
Outlier Detection by DBSCAN
Outlier Detection by Autoencoder
Dimensionality Reduction by Autoencoder
Data Cleaning by Autoencoder

Some algorithms provide the “how” to perform more than one “what”, highlighting the significance of modularization. Autoencoders, for example, are useful creatures for all sorts of tasks, such as Outlier Detection, Dimensionality Reduction, and Data Cleaning. The underlying algorithm is the same, but it implements a different interface depending on its purpose. Also, one autoencoder fine-tuned and trained to perform task A might not be perfectly suited to perform task B, even if the underlying algorithm is the same. Therefore it is natural to create separate models with separate choices for hyperparameters to tackle two different tasks.

Model #1 = Outlier Detection by Autoencoder
Model #2 = Dimensionality Reduction by Autoencoder
Model #3 = Data Cleaning by Autoencoder

Technical readers will recognize that what we are suggesting (and how we power our analytics under the hood) is defining a model as the implementation of an abstract interface with an algorithm property, allowing the user to interact with the model via its interface methods, which internally dispatch tasks to the algorithm to do the heavy lifting. Beyond our technical success using this paradigm, we believe this vocabulary introduces a powerful new way for thinking about what a model is from a design perspective. We hope that by proving its intuitive design in our platform, we can encourage the broader use of these terms and concepts.

Model = Interface by Algorithm

Stay ahead with expert articles on MRM and AI risk topics, in-depth whitepapers, and Yields company updates.

Streamline your model lifecycle.

Get exclusive insights

Interfaces and Algorithms

Sanna Granholm: Head of Marketing at Yields

Webinar – How to get started with managing model risk?

Webinar recording – How to validate AI?

Company

Technology

Resources

Legal

Streamline your model lifecycle.

Get exclusive insights

Interfaces and Algorithms

Subscribe to the Yields Newsletter

Sanna Granholm: Head of Marketing at Yields

Webinar – How to get started with managing model risk?

Webinar recording – How to validate AI?

Subscribe to the Yields Newsletter