What Makes a Good Data Science for Social Good Project?
Data Science for Social Good is a summer program that requires year-round preparation. A successful summer requires a mix of good people and projects, and we spend a lot of time trying to find projects to solve and the people to solve them. In addition to reading over 1000 applications from aspiring fellows, mentors, and project managers, we’ve spent numerous hours researching, pursuing, and scoping projects: exploring datasets, speaking with representatives, and wrangling with attorneys over legal agreements, HIPAA, FERPA, and many other acronyms. Well over a hundred projects will cross our emails, phones, and eyes before we find the ones to do next summer.
We vet potential projects as best we can because it is costly to make a mistake. We have limited funds and time to work on projects. The fellows need to hit the ground running and to make progress throughout the summer, neither of which can happen if we don’t choose the projects well. For various reasons, we’ve had to “kill” a project or two once the fellowship was underway and switch the teams to a different project, which disrupts the environment and makes it difficult for the affected team of fellows to be as effective as they would’ve been otherwise.
Many people have asked what we look for when choosing projects. There is no simple formula. We consider many things, some of which relate to the goals of the fellowship and some of which are practical. We decided to write this post to give everyone a better idea of what we look at when we evaluate a project’s potential.
Several things are required for a project to achieve success. The following list outlines the requirements for a successful project. These alone do not guarantee success, but success is nearly impossible without them. We have found some requirements easier to satisfy; we have ordered them from easiest to most difficult:
- A solvable problem. Some problems are too big or too difficult to solve in a summer. If an organization were to ask us to solve world poverty, we would have to decline because it’s far too complex a problem for a summer fellowship. Few potential partners have pitched unsolvable problems, but when they do, we can usually get around it by focusing on one aspect of the problem. DSSG cannot solve poverty in three months, but we can help alleviate it by reducing unemployment, homelessness, maternal mortality, lead poisoning, and school dropout rates and increasing smart urban investments, home-visitation rates, insurance rates, and social services interactions, among other things.
- A challenging problem. We look for projects that will challenge three or four data science fellows for the duration of the fellowship. Anything less squanders the fellows’ time and denies the them an opportunity to learn. Challenging problems encourage teamwork, spawn creative solutions, and play a key role in DSSG’s ability to “produce data scientists with strong skills in solving real-world problems and an understanding, excitement, and passion for solving problems with social impact.” For example, our World Bank team worked with another fellow who has search-engine expertise to find links between corrupt applicants online, and our Chicago Public Schools team worked with fellows who had strong D3 experience to build an interactive map showing where kids go to school. We once tried to give several smaller projects instead of one big project to a team of fellows but found that it did not work as well.
- An important problem with social impact. We make a substantial investment in each project, not only financially (typically over $100,000) but also opportunistically (when we choose to do a project, we choose not to do another project). We’d like to dedicate our limited resources to substantial problems. Each project must meet an operational need for the partner organization and must have a tangible connection to “social good.” We’d decline a hedge fund if it asked us to help get bigger returns or an NGO that asked us to purely analyze historical data that have little relevance or actionable impact today. All else being equal, we value projects that help more people over fewer people and that solve chronic problems over temporary problems. Past projects have been in areas such as public health, education, economic development, disaster response, and the environment, but other projects qualify.
- A motivated, capable, and committed partner. No project can succeed without a fully invested project partner. Project partners understand the problem, they have subject-matter expertise, and they ultimately decide how our work is used. Being practitioners, our partners often look at the problem differently than we do, which is important for solving tough problems. We need them to provide insight into the problem and to guide us as we develop a solution.This demands a lot from partners. It often requires partners stretching themselves and asking themselves hard questions. It also requires time. We look for partners who will help scope the project before the fellowship, give a presentation about their work in the second week of the fellowship, chat at least once a week with the team throughout the fellowship, and use our work after the fellowship. In our experience, this level of engagement usually requires an individual within the partner organization to dedicate about 20% of her time over the summer to supporting the team — not a small ask, especially for non-profits and governments with resource constraints or small staffs.
- Appropriate, relevant data. Getting the data we need is almost always the biggest challenge. Important things go unmeasured or unrecorded or, more commonly, cannot be shared. Many of our projects involve medical, educational, and other sensitive information. Getting lawyers to agree on data and code sharing can take months. We try to be flexible — partners have anonymized data (while keeping it useful at an individual level), conducted background checks, hired our fellows as (unpaid) interns, and required us to do our analyses on their internal computer systems (remotely) — while maintaining a spirit of openness. We have released code, but not data for the Nurse-Family Partnership, Chicago Public Schools, Montgomery County Public Schools, and the Chicago Alliance to End Homelessness. We expect our partners to provide us with all the relevant data they have so we can build a solution that’s appropriate, effective, and easily deployed.
While the above requirements help determine whether a project can succeed on its own, other factors help determine whether a project helps DSSG succeed:
- Diversity of projects. One of DSSG’s goals is to inspire social gooders to adopt data-driven solutions. By choosing a diverse set of projects, we can demonstrate the value of analytics to the whole sector. A diverse set of projects also draws a richer pool of fellows, helps keep people interested, and leads to the cross-pollination of ideas.
- Long-term relationships. It is easier to continue a relationship with a project partner than to start a new one. By the end of our first summer together, we have greased the wheels for cooperation: we have worked out a legal agreement, established expectations, and built personal relationships. In addition, long-term relationships tend to result in more implementation and impact from our work. All else being equal, we prefer to continue relationships with previous partners but we are always looking for strong, new partners.
Why work with us?
We try to focus on the burgeoning field of data science rather than trying to compete with the many qualified people working in more established fields such as website design and observational impact evaluations. While many organizations correctly recognize the value of those endeavors, we hope to evangelize the utility of a broader set of data-driven tasks, including prediction, classification, and clustering, among governments and non-profits. For many partners, this means viewing data less as a tool for justification and more as a tool for program improvement.
Many organizations have also suggested we work together to figure out what data they should be collecting. That can be a worthwhile discussion for our year-round center, the Center for Data Science and Public Policy, but it’s not the type of project that suits our summer fellowship. We want the fellows to get their hands dirty with data; figuring our which data to collect and then collecting those data means less time using those data.
How to Get Involved
We’re always looking for new ways to apply our skills for the social good. If you have a project for us to work on, please let us know. We look forward to hearing from you!