Human Lessons Learned Implementing Early Intervention Systems in Charlotte and Nashville

Hareem Naveed, Klaus Ackermann, Joe Walsh, Adolfo De Unánue, Rob Mitchum, Lauren Haynes, Rayid Ghani

Human Lessons Learned Implementing Early Intervention Systems in Charlotte and Nashville

This is the third in our three-part series “Lessons Learned Deploying Early Intervention Systems.” The first part (you can read it here) discussed the importance of data science deployments, while the second blog post in the series discussed the technical challenges related to the implementation. This final part is about the other (and typically more important) considerations in the implementation and deployment of Data Science products: Humans.

Background

For the past two years, we have worked with multiple police departments to build the first data-driven Early Intervention System (EIS) for police officers. Our EIS identifies officers at high risk of having an adverse incident so the department can provide the officer with training, counseling, or other support to prevent those bad outcomes. (Read our peer-reviewed articles here, here, and here and our blog posts here, here, here, here, and here.) Metropolitan Nashville (MNPD) became the first police department to deploy our EIS last fall, and Charlotte-Mecklenburg (CMPD) became the second this month.

Surprisingly little has been written about deploying machine learning models, given all the talk about how machine learning is changing the world. We’ve certainly run into challenges deploying our own work that no one has discussed, as far as we know. This blog post discusses some of the human issues we’ve encountered so others can learn from our choices, both good and bad.

Human problems relate to issues that come from building a system that people can use effectively and trust in their daily work. We data scientists tend to focus on technical issues because it’s what we’re trained to do in school, but the human aspect is more important in whether, and how, the system is used going forward.

People are critical to the success of an EIS because we’re ultimately trying to change human behavior by implementing these systems. If the system isn’t built for them, they won’t use it, and high-risk officers won’t receive helpful interventions. Here are some ways we’ve thought about and addressed this challenge.

Provide the department what they need: Data scientists should build and select models that optimize the partner’s task. For example, MNPD has been experimenting with how often it should produce lists, how long those lists should be, and what types of problems the EIS should focus on. We began by estimating an officer’s risk of getting any adverse finding, whether an unjustified use of force or a minor uniform violation. But the department soon realized that the system flags too many officers who go on to have minor violations and not enough who go on to have major violations, so we adjusted the system to learn patterns that predict incidents severe enough that they result in terminations, multi-day suspensions, etc. MNPD also started with flagging 100 officers a couple times a year but is moving toward flagging fewer officers every week, as they grow more comfortable with the system. We select the model that performs best on the specific combination of requirements MNPD has at that moment.

CMPD’s needs have also changed over time. CMPD used to send all flags that met thresholds to supervisors for review, no matter how many that is. At one point CMPD’s EIS was flagging nearly two-thirds of its officers a year, which created an unmanageable administrative burden. CMPD modified its policy for the new EIS, initially sending all flags to a supervisor at headquarters, who will conduct a preliminary review before deciding whether to forward the flag to the officer’s direct supervisor. With all flags going through a single person, CMPD has decided to start by reviewing 5% of officers each month but may adjust that number. Therefore, we focused on precision and recall for the top 5% of the list.

As we prepared to deploy our system, CMPD expressed interest in officers who have unusually large jumps, say from the bottom third of the list to the top third (or vice versa), which could indicate a developing issue. We created a database function that takes the number of large changes the department can review as input and returns a list of those changes so the department can choose how many officers they want to review from the top of the list and how many they want to review from large changes.

Responsibilities and authorizations: One of the most important things a department can do is to assign responsibilities for the EIS. If no one is in charge of the EIS and how it’s used, the EIS may stop working correctly and no one will notice. (It has happened before.)

To avoid those problems, CMPD and MNPD have delegated technical responsibilities to members of their IT staff and administrative responsibilities to sworn supervisors. They have also authorized members of their department to use and maintain the system. IT staff are running the EIS on the department’s servers, while Professional Standards or Human Resources staff are monitoring the results and requesting system improvements. Supervisors are required to review EIS flags and provide feedback. We’re not responsible for either system, but we continue to provide technical support to both departments as they learn to use the EIS.

These clearly defined roles will help ensure that the EIS helps the department as much as possible.

Make the system easy to use: CMPD built the EIS to look like its other interfaces, so supervisors will feel comfortable using it. (See Rayid’s research to learn more about the benefits of using a familiar interface.) We also wrote code to translate our variable names (e.g. “ir_id_1y_interventionsoftype_counseling_sum”) into English (“number of counseling interventions the officer has received in the last year”).

We originally provided risk scores but settled on providing ranks. Ranks make more sense because our models are optimized for relative risk (officers at the top of the list are more likely to have an adverse incident than other officers) rather than absolute risk (e.g. an officer has a 50% chance of having an adverse incident). Not only are risk scores potentially misleading (or incorrectly interpreted as probabilities), but an officer’s risk score may change significantly even if his/her rank in the department does not.

An early version of CMPD’s EIS interface showed an officer’s EIS risk history scaled by the officer’s rank range. For example, the y axis ranged from 1 to 10 for an officer who was consistently in the top 10, which gives the impression that the rank changes are huge even though they’re not. The y axis should have a minimum range (e.g. 1 to 100) so small changes don’t look bigger than they are.

We needed to decide what to show supervisors as an officer’s risk changes. For example, if the EIS stops flagging an officer before the supervisor review, the EIS stops showing the flag because the officer no longer presents the same risk. Similarly, if an officer’s risk scores change before the supervisor review, the EIS should present the supervisor with the most recent information.

Value supervisor expertise: Police supervisors know a lot about their officers and policing, and they play a central role in the EIS process. Yet we’ve never seen an EIS that adjusts and learns from their feedback. We’re changing that by adding supervisor-feedback variables to the system. When the EIS flags an officer, the supervisor declines to intervene, and the officer does not have an adverse incident, the EIS gives the machine learning model less weight and that supervisor’s feedback more. Similarly, if the supervisor flags an officer in the system (i.e. the EIS does not) and that officer goes on to have an adverse incident, the EIS will give more weight to the supervisor’s feedback. Supervisors can also agree or disagree with the EIS’s risk factors and write a note about why they think the EIS is wrong (e.g. we’re missing important variables), which we can use to manually improve the model.

Example EIS interface showing the officer’s risk, risk factors, and history

Learning from supervisor feedback incentivizes supervisors to use the EIS even if they dislike it because it reduces the number of false positives they have to deal with. Existing EISs don’t.

Tracking human feedback this way may also help department leadership, supervisors, and officers understand the value of the EIS. There are a couple common pushbacks to machine learning systems that the feedback can help address. Humans may be overconfident in what they know about the problem, and they may question whether a computer can know something they don’t. They may even argue against the EIS because it’s not completely accurate. We will be able to show not only what the EIS gets wrong but also what the supervisors get wrong. It would help to know which types of errors the EIS can reduce and what insights into risks the EIS can provide.

[1]

Communicating the EIS: This project almost ended before it began. We started the work as part of the Obama Administration’s Police Data Initiative (PDI), which had two parts: making more police data available to the public and building a data-driven EIS. Unfortunately, the White House issued a press release that did not make it clear enough that the PDI’s open-data projects and EIS projects were separate. Many officers were understandably upset because they thought DSaPP would make sensitive data such as their home addresses available to the public. They were also concerned about CMPD sending data off site without it being made anonymous. CMPD and DSaPP resolved these issues, but not after unnecessary anxiety.

DSaPP personnel traveled to Charlotte, went on ride-alongs with officers, and met with numerous CMPD officers and staff, including the department leadership, Internal Affairs investigators, and mid- and low-level supervisors. CMPD convened an officer focus group, which included officers who expressed concerns about the project. DSaPP and the focus group discussed how the system is built and monitored and how it can be improved. That meeting proved to be one of the best parts of the project because it helped address misunderstandings and built trust. Participating officers suggested what proved to be among the best of adverse incidents, including suicide calls.

Next Steps

In partnership with the Charlotte-Mecklenburg Police Department and Metropolitan Nashville Police Department, DSaPP has built the best early intervention system for police officers. But we still have improvements to make for both the technical and user considerations:

Individual feature importances. We developed a way to extract officer-level risk factors, but we’re planning to test alternative methods to figure out which result in the best outcomes.
We will do more to study and recommend interventions. We plan to estimate intervention effectiveness using several approaches, including regression discontinuity designs (e.g. compare officers just in the top 5% to officers just outside).

Intervention effectiveness probably varies across officers and departments. For example, some officers probably respond more to interventions than others, and some types of interventions probably get a bigger response than others. We plan to study which interventions work for which situations and which officers.

Finally, we plan to look more at relatively low-risk officers — officers who have relatively low risk ranks despite their assignments and activities. Departments may be able to learn from those officers and try to model their behavior.

Footnotes:[1] Of course, supervisors might be far more accurate than the EIS, in which case the department should probably stop using the EIS and start asking why their supervisors aren’t addressing problems they know about.

If we had to guess, we think the supervisors will be more accurate at the very top and bottom of the list, while the EIS will be more accurate in the middle of the list. Machine learning models often outperform human experts as uncertainty increases, such as predicting which students will drop out and predicting how the Supreme Court will decide cases.