Site icon R-bloggers

Crowd Counting Consortium Crowd Data and Shiny Dashboard

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Jay Ulfelder, PhD, serves as Program Manager for the Nonviolent Action Lab, part of the Carr Center for Human Rights Policy at the Harvard Kennedy School. He has used R to work at the intersection of social science and data science for nearly two decades.

Where are people in the United States protesting in 2020, and what are they protesting about? How large have those crowds been? How many protesters have been arrested or injured? And how does this year’s groundswell of protest activity compare to the past several years, which had already produced some of the largest single-day gatherings in U.S. history?

These are the kinds of questions the Crowd Counting Consortium (CCC) Crowd Dataset helps answer. Begun after the 2017 Women’s March by Professors Erica Chenoweth (Harvard University) and Jeremy Pressman (University of Connecticut), the CCC’s database on political crowds has grown into one of the most comprehensive open sources of near-real time information on protests, marches, demonstrations, strikes, and similar political gatherings in the contemporary United States. At the start of August 2020, the database included nearly 50,000 events. These data have been used in numerous academic and press pieces, including a recent New York Times story on the historic scale of this year’s Black Lives Matter uprising.

As rich as the data are, they have been a challenge to use. The CCC shares its data on political crowds via a stack of monthly Google Sheets with formats that can vary from sheet to sheet in small but confounding ways. Column names don’t always match, and certain columns have been added or dropped over time. Some sheets include separate tabs for specific macro-events or campaigns (e.g., a coordinated climate strike), while others group everything in a single sheet. And, of course, typos happen.

To make this tremendous resource more accessible to researchers, activists, journalists, and data scientists, the Nonviolent Action Lab at Harvard’s Carr Center for Human Rights Policy—a new venture started by CCC co-founder Chenoweth—has created a GitHub repository to host a compiled, cleaned, and augmented version of the CCC’s database.

In addition to all the information contained in the monthly sheets, the compiled version adds two big feature sets.

To make the CCC data more accessible to a wider audience, the Lab has also built a Shiny dashboard that lets users filter events in various ways and then map and plot the results. Users can filter by date range, year, or campaign as well as issue and political valence (pro-Trump, anti-Trump, or neither).

The dashboard has two main tabs. The first uses the leaflet package to map the events with markers that contain summary details and links to the source(s) the human coders used to research them.

The second tab uses plotly and the streamgraph html widget package to render interactive plots of trends over time in the occurrence of the selected events, the number of participants in them, and the political issues associated with them.

The point of the Nonviolent Action Lab’s repository and dashboard is to make the Crowd Counting Consortium’’s data more accessible and more useful to as wide an audience as possible. If you use either of these resources and find bugs or errors or have suggestions on how to improve them, please let us know.

To leave a comment for the author, please follow the link and comment on their blog: R Views.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.