Data Science for Social Good

This year’s Knight News Challenge asks, “How might we make data work for individuals and communities?” Data Science for Social Good and its parent organization, the Center for Data Science and Public Policy, pursue answers to this question every day. We submitted four of our projects to this year’s challenge so we can develop them and release them to the public.

We would appreciate your support and feedback. We have summarized our four projects below, but you can read our full proposals, comment on them, and give them “applause” (the little hearts that appear on each submission) at the Knight News Challenge website. You can find the complete proposals here.

ART: Automated Redaction Tool for Fast Processing of FOIA Requests

The Freedom of Information Act (FOIA) constitutes a critical tool for transparency and accountability by enabling journalists to get public records from the government. Government agencies at all levels are required to release the documents in a reasonable time frame while ensuring that private data are not released. Most agencies rely on manual redaction to fulfill FOIA requests, but that approach is costly, slow, and error-prone. Computer-assisted redaction can help.

This project will create the Automated Redaction Tool (ART), a piece of software that helps government agencies improve their FOIA processes. Here’s how it works: A government FOIA specialist chooses how aggressive the automatic redaction should be. The software flags potentially sensitive information for review. Then the specialist confirms and edits the results, which is faster and easier than doing the entire process manually. ART further uses the specialist’s choices to learn and improve.

By providing faster FOIA responses with less human input, ART can increase government transparency, leading to more accountable and democratic governance.

Data Fellows for Stronger Neighborhoods

Using data for good requires talent. Data Fellows for Stronger Neighborhoods proposes to deploy fellows to small neighborhood-based nonprofits to help them use data to advance the goals of their communities.

In cooperation with the Local Initiatives Support Corporation (LISC Chicago), we will deploy data fellows to ten community-based organizations serving low-income Chicago neighborhoods and residents. Each organization will host a data fellow who will work on strategic alignment of data and evaluation strategies, suggest and implement improved data systems in priority areas, and help the neighborhoods collect better data, perform analysis, and ensure the results of the analysis are sustainable and can be implemented to advance the community’s goals.

This project offers several potential benefits. First, the neighborhood organizations will learn better ways to articulate impact, manage performance, increase efficiency, and access funding. Second, the data fellows will gain real-world experience and develop peer relationships. Third, the project will contribute to the construction of a data-friendly culture and demonstrate how simple and not-so-simple investments can make a real difference.

LID: Legislative Influence Detector

Researchers and concerned citizens would like to know who’s writing legislative bills, but trying to read those bills, let alone trace their source, is tedious and time-consuming. This is especially true at the state and local levels, where arguably more important policy decisions are made every day. This tool helps watchdogs stretch their limited resources further by flagging legislative text re-use within seconds.

The Legislative Influence Detector (LID) quickly searches more than 550,000 state legislative bills and 2,400 pieces of model legislation written by lobbyists for textual similarities to find the true source of state legislation. This summer, we tested LID using an anti-abortion bill signed into law by Wisconsin governor Scott Walker and found 73 other bills from around the country that matched nearly word for word, as well as the original source.


We would like to ready LID for public use. This will require building two things: 1) a user-friendly website that allows the user to enter a bill in a text box and get potential matches in return and 2) a notification system that checks every state bill introduced that day and emails potential matches to the user. These enhancements will make LID research more accessible, allowing Americans to not only investigate bills of interest but also find legislative influence in bills off their radar.

ReBUILDD: Real-time Business Indicators and Labor Dynamics Database

Many job seekers, workforce-development providers, and policymakers would like to know what’s happening in the local labor market, but most labor market data only give national snapshots, delayed by months.

The Real-time Business Indicators and Labor Dynamics Database (ReBUILDD) gives local labor data to the people who need it when they need it. ReBUILDD combines public and private data sources into a new open-source, standardized data resource that provides real-time, locally-relevant information for job seekers and the agencies and organizations who train and serve them.

When complete, ReBUILDD will generate new information on labor market dynamics (for example, zip code-level skills gaps), giving a finer-grained and more actionable picture of the labor market and helping people “get the skills to pay the bills.”

Once again, you can support these projects by clicking the heart icon on their pages at the Knight News Challenge website. We appreciate your help!