Algorithmic risk in your smartphone

The early summer day at the Bonnaroo Festival was filled with music, laughter, and wild dancing. But amidst the festivities, a technological hiccup was unfolding. Multiple iPhones and smartwatches were interpreting the vigorous dance moves of their users as crashes and automatically dialed 911

These types of false positives have happened before (e.g. during skiing and on a rollercoaster) and this has reportedly led to strain on the emergency services. On the other hand, the same feature has saved the lives of several people who were unable to call 911 after a severe crash.

In the current post, I would like to explain first of all why these false positives are not simple software bugs but rather inherent risks to using models. Secondly, I will discuss how sound model risk management practices can help in making the difficult trade-offs needed to limit accidental calls while still detecting an actual crash. These types of trade-offs are common in any modeling exercise and are critical in deciding whether an analytics project becomes a success or a failure. 

Why False Positives Occur

Let’s start by first understanding why a state-of-the-art feature like crash detection would mistakenly perceive dance moves as a car crash.

Sensor Data Noise

No alt text provided for this image
Illustrative sensory noise taken from M.S.F Al-Din, “Real-Time Identification and Classification of Driving Maneuvers using Smartphone”.

Crash detection in smart phones uses machine learning models that feed on various sensory data from accelerometers, gyroscopes, microphones etc. Although accelerometers in smartphones are generally accurate, all sensory data is subject to noise. (Moreover, accelerometers can even be fooled by sound waves.) and such noise always leads to decreased accuracy. In other words, the same shaking motion that indicates a crash can be mimicked by various high-intensity activities – from skiing down a mountain to riding a rollercoaster, and apparently, dancing at Bonnaroo!

Binary Classification Algorithms

Crash detection functions as a binary classifier, meaning that it discerns between two states: crash or no crash. Typically, such classifiers output a probability that the input belongs to either class. The most important parameter of the model is therefore the threshold probability above which one will classify the given input signal as a crash. The optimal value of this threshold should strike a balance between detecting actual crashes while not flooding the emergency services with accidental calls. The stakes can hardly be higher. 

Building a robust AI solution

Sound model risk management (MRM) practices can help in harmonising the apparent opposite interests of various groups and lead the path to more robust models. In the case at hand, there are three MRM principles that stand out.

Independent Model Review

No alt text provided for this image
The model lifecycle process, as described in the Yields.io best-practice MRM framework.

As already highlighted in another post where we analysed the moon landing failure of ispace, a model can only be deployed in production after it has been independently reviewed. Such a validation can offer fresh perspectives. Validators, equipped with a neutral vantage point, might for instance recognize the potential of integrating additional data sources. For instance, geolocation data could indicate the likelihood of a crash. It’s highly improbable for a car accident to occur in front of the Bonnaroo festival stage. 

Inclusive Decision Making

1692178905168?e=1698278400&v=beta&t=hZh7 uQFGpG2fCcmUAvgp3COwpKs yl1DOr7HsPbi A
RACI matrix highlighting the importance of the approval body throughout the model lifecycle, taken from the Yields best practice framework.

Striking the delicate balance between saving lives and accidental calls requires comprehensive stakeholder involvement. This isn’t just a matter for tech developers or business strategists. It necessitates collaboration between vendors (like Apple), users, and critical services like 911 operators. Their collective insights can ensure that the technology serves its purpose without inadvertently causing strain on emergency services. This type of stakeholder involvement is built in model risk management by leveraging a so-called Approval Body that has to decide whether a model can be released. That Approval Body has to be representative of all teams that are impacted by the model. 

Transparent Disclosures 

The previous topic brings us to a final and critical issue related to third-party model risk management. In the case of the Bonnaroo festival, the emergency services nor the IPhone owners built the crash detection algorithm as it was developed by a vendor (ie. Apple). This is a common scenario since many enterprises depend on AI applications they procure from vendors. 

A cornerstone of third-party model risk management is transparency. Vendors should  disclose data, testing methodologies, and results to allow stakeholders to be able to decide whether or not they can risk-accept such algorithm. Such transparency fosters trust and facilitates collaborative fine-tuning of the model. However, the challenge lies in ensuring that this does not compromise proprietary intellectual property of the vendor.

Conclusion

The Bonnaroo incident offers more than just a quirky anecdote. It underscores the challenges of designing machine learning models that operate in the dynamic, unpredictable real world. As we integrate technology more deeply into our daily lives, acknowledging, understanding, and managing model risk becomes paramount. After all, the dance between technology and life is intricate.