Tech Lessons Learned Implementing Early Intervention Systems in Charlotte and Nashville
This is the second in our three-part series “Lessons Learned Deploying Early Intervention Systems.” The first part (you can find it here) discussed the importance of data science deployments.
For the past two years, we have worked with multiple police departments to build and deploy the first data-driven Early Intervention System (EIS) for police officers. Our EIS identifies officers at high risk of having an adverse incident so the department can prevent those incidents with training, counseling, or other support. (Read our peer-reviewed articles about the project here, here, and here and our blog posts here, here, here, here, and here.) Metropolitan Nashville (MNPD) started using our EIS last fall, and Charlotte-Mecklenburg (CMPD) became the first to fully deploy it in November.
Surprisingly little has been written about deploying machine learning models, given all the talk about how machine learning is changing the world. We’ve certainly run into challenges deploying our own work that no one has discussed, as far as we know. This blog post discusses some of the statistical and computational issues we’ve encountered so others can learn from our choices, both good and bad.
Output storage: For almost all projects, DSaPP stores model information and output in the same database format. One table stores information about what we call “model groups.” A model group is a unique combination of model characteristics, such as the algorithm, hyperparameters, random seed, and features. Another table stores information about “models,” each of which is a model group fit on training data. We store all the predictions (each trained model applied to out-of-sample data) in a table and standard evaluations (e.g. accuracy, precision, recall, ROC AUC, brier score, how long it takes to train) in another. The last two tables store feature importances for the model and for each prediction. Using the database to store output makes querying and analyzing results fast and easy. Thanks to thoughtful table design and indexing, we can query billions of predictions and statistics in seconds.
To allow future testing and development without impacting the live EIS, we created a separate “production” schema. We added new database tables for storing models, predictions, and feature importances for the models we decide to use, as well as a new “time_delta” table that stores each officer’s rank change over the past day, week, month, quarter, and year and a “review_audit” table that stores supervisor feedback for officer predictions. This schema is linked to the testing and evaluation environment by the model_group to ensure only models used in the production environment have been thoroughly tested against past data and to prevent accidental changes to the hyperparameters.
Production schema for department use
Using a standard format to store model outputs has an additional benefit: we built Tyra, a webapp that plots results for our projects. Tyra makes it easy for us and our partners to look at model performance without having to write database queries or read numbers from a table. Tyra includes precision at k over time; precision, recall, and ROC curves; feature distributions for officers with and without adverse incidents; and more. You can read more and download the code from Github: https://github.com/dssg/tyra.
An example page from Tyra
Accuracy: Choosing a model based on its precision in one time period is asking for trouble. A model may get lucky and generate accurate predictions for that single time period but not others (including in the future). To avoid this, we want a model that generates high precision in the top 100 over time. We built checks to flag unusual changes so someone can confirm the model is still performing as expected.
Extra trees and random forest tend to have the highest precision in the top 100. Logistic regression is the most accurate for a couple time periods. If we were to select models based on that one time period, we would choose poorly.
Effective interventions will reduce the value of these statistics. If the interventions are perfectly effective, the EIS will appear to have 0% precision, even if it’s 100% accurate, as the identified officers will receive interventions that prevent the adverse incident from occurring.