Are you hearing a lot of data-themed buzzwords lately? Are you wondering whether the use of data science can help a project on which you are working?
Using data makes it easy to prove the efficacy of a project and inform your decisions. More simply, data science is the process of taking all the data you have lying around and turning it into actions.
Who are we? We’re a team of three Eric and Wendy Schmidt Data Science for Social Good fellows at the University of Chicago. This summer, we’re spending 14 weeks helping the World Bank Integrity Vice Presidency (INT) use data to detect collusion, corruption and fraud in development projects. We’ll use examples from our work so far with INT to demonstrate the important steps needed to assemble an effective data science project.
Even if you are not a data scientist, you are probably wondering about the tools and approaches needed to start using data science. Below, we outline four steps (and how-tos) for getting started with data science.
Step 1: Set A Goal
Like any other project, setting a goal is key for data science projects.
Your goals may vary. As a project manager, what are you hoping to accomplish by looking at your data? Do you want to showcase the impact of your project to funding partners? Do you need to decide what to do next? Do you want to improve existing practices?
The World Bank Group provides loans to developing countries so that they can finance investments in infrastructure, health, and environment sectors while boosting economic and social opportunity for the poor. While managing these loans, client countries post RFPs (requests for proposals), to solicit and receive bids from contractors.
Occasionally, during the bidding and billing process, companies may engage in corrupt behavior. Misconduct is revealed via whistleblowers’ complaints and/or the proactive work of contract supervisors and investigators that identify red flags. With the help of the Eric & Wendy Schmidt Data Science for Social Good Summer Fellows, the Group would like to take a more data-driven approach to achieve two goals:
- Develop a methodology to prioritize complaints filed by whistleblowers based on data science applied to historical records of past World Bank-financed contracts and investigation outcomes.
- Support the proactive work by World Bank staff in identifying red flags in high risk projects
In order to begin working towards these kinds of goals, a data science team will first examine the current processes and get a clear picture of the data landscape. For example, how many contracts are typically awarded, or how are the contract amounts are distributed (that is, how many contracts are awarded for $100,000, how many for $200,000, etc.)? Further, do the amounts and numbers of contracts vary based on geographic location? What are the salient data features of successful investigations?