DSSG Sunlight Team


On Monday, Wisconsin governor and 2016 presidential candidate Scott Walker signed into law a bill banning non-emergency abortions past the 19th week of pregnancy. Unsurprisingly, Walker’s move garnered support from one side, derision from the other, and media attention from both. However, journalists face a big hurdle when trying to provide context for a story such as this: it is time-consuming to figure out how many states have introduced similar legislation and where it originated.

Automated detection of copied legislation can help. Data Science for Social Good fellows Matt Burgess, Eugenia Giraudy, and Julian Katz-Samuels, technical mentor Joe Walsh, and project manager Lauren Haynes are working with the Sunlight Foundation to make it easier to find re-used text. Using Sunlight’s corpus of state legislation, our computational tools uncover textual similarities.

In its current state, our tool allows the user to enter the text of a bill and then returns documents that potentially match. The tool highlights similar sections in those documents, allowing the user to quickly evaluate the similarities. With the information about bill number, its state, and its legislative session, we analyzed which bills have passed, which ones have died, and which ones are still under consideration.


The above screenshot shows the tool’s highest-rated match for Walker’s bill. The left-hand side displays text from Wisconsin Senate Bill 179 (2015), and the right-hand side displays text from Louisiana Senate Bill 593 (2012). The highlighting shows that these sections match almost perfectly. Where differences exist, they are usually numbers versus spelling (e.g. “16” versus “sixteen”) or section identifiers (e.g. “(b)” versus “(9)”).

The tool provides a list of documents that possibly match, as well as scores that show the strength of the match. We found the 100 best matches and further narrowed the list to 73 by only keeping the documents containing the word ‘pain’ (one of the distinguishing terms for this legislation). Here are several:

Similar bills that passed:

Examples of similar bills under consideration:

Examples of similar bills that have died:

The map video below shows in red the bills that passed both the lower and upper chambers, and in purple the bills that were introduced but did not pass. The first bill was introduced in 2010 in South Carolina but failed to pass. In 2011, the bill was introduced in 14 states, with 3 of them passing it: Idaho, Kansas, and Oklahoma. In 2012, six states introduced the bill, but only Georgia approved it. In 2013, eight state legislatures introduced the bill, with Kansas, Texas, and Arkansas passing it. Last year, West Virginia passed the bill while five states rejected it. As of today, seven states have similar bills under consideration: Oregon, Montana, Iowa, Illinois, Kentucky, Virginia, and Maryland.

Many pieces of legislation begin on a lobbyist’s desk. We are collecting pieces of model legislation so we can trace legislative text back to its source. We have collected more than 1,500 pieces of model legislation, but most of those come from only a handful of lobbying groups. Although it contained significant text reuse with Scott Walker’s abortion bill, it was not in our collection of model legislation. We searched Google for some of the copied text and found the original at Doctors on Fetal Pain, a website created to promote the view that fetuses can feel pain at 20 weeks.

The tool is not yet available for public use. We’re still improving the performance of the algorithm, increasing its robustness to website traffic, incorporating model legislation into our corpus, and automating the legislative searches. However, as this analysis shows, the tool can already provide significant results. We believe this tool will be a valuable resource for journalists and scholars to shed light on state-level politics.