Hareem Naveed, Klaus Ackermann, Joe Walsh, Adolfo De Unánue, Rob Mitchum, Lauren Haynes, and Rayid Ghani

Lessons Learned Implementing Early Intervention Systems in Charlotte and Nashville, Part 1

From the company’s creation, Netflix has relied on the scalability and accuracy of machine learning to deliver content and turn profits. One way Netflix uses machine learning is to recommend movies to its users. A model that provides accurate and tailored recommendations at scale is valuable because it increases the value of Netflix subscriptions at low cost. The company decided to host a competition to find a better recommendation model, offering $1 million for a submission that reduces errors by at least 10%. Three years and 44,000 submissions later, they found a winner.  

The Netflix Prize led to improvements in Netflix’s recommendation system, technical developments, friendships and collaborations, and even new companies. Yet it failed to deliver a model that Netflix could use. The competition incentivized performance on static data, rather than performance in deployment. The winning model was so complex and difficult to update that Netflix decided not to use it.

Netflix’s experience sounds familiar. We’ve deployed multiple projects, and each time we find new challenges. We’d like to know how others have dealt with these issues, but to our disappointment, there isn’t much out there. Try searching Google. You’ll see what we mean.

It’s important to get deployment right. An otherwise good model can fail and, more importantly, do serious harm if the deployment is not handled well. Here are just a few issues to consider:

AreaExample IssuesPotential Consequences
Governance
  • What policies should be in place for managing the model and the people who interact with it?
  • Who’s responsible?
  • Who’s authorized?

  • The EIS doesn’t get used, so we fail to prevent adverse incidents
  • The EIS gets used incorrectly, so we fail to prevent adverse incidents
  • The EIS stops working well, but it gets used anyway
Trust
  • Black-box nature of the models
  • Incorrect interpretation of the outputs
  • The model fails to meet impossibly high expectations
  • Supervisors won’t use the model if they don’t trust it, even if the model is quite good, wasting resources and failing to prevent adverse incidents
Cost to Use
  • Technical costs, akin to the Netflix Prize
  • Human costs, such as a steep learning curve or terrible interface
  • The department might stop supporting the model, even if it can help prevent adverse incidents
  • The supervisors might stop using the model, so opportunities for effective intervention are missed
Accuracy
  • Models lose accuracy
  • Models are more error prone for protected classes (bias)
  • The department wastes resources intervening on the wrong officers
  • The department misses opportunities to intervene on the highest risk officers

Successful organizations already develop policies, assign responsibilities and authorities, develop and depend on trust, make cost-sensitive decisions, and monitor performance when it comes to their employees. But these things happen too rarely with machine learning. A deployed machine learning system is essentially a living thing and should be treated as such, providing it with constant care and attention.

Given how little has been written about deploying machine learning models, we decided to write about our experience deploying the first data-driven early intervention system for police officers. We have posted blog entries outlining our other discoveries, covering the technical and human aspects of such deployments. We hope you find them useful, and we look forward to hearing how others build on them. Please check back regularly to catch the latest.