Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
“I’d rather do data than date”. I overheard this while eavesdropping on a conversation among three female data journalists while waiting for an elevator at the IRE-CAR (Investigative Reporters and Editors – Computer-Assisted Reporting) conference last month. I would like to think the remark was overloaded with hyperbole, but maybe not. Most of the attendees as this conference were motivated, tenacious, and highly skilled data hounds, the kind of investigative journalists who pry information from government databases through persistent requests, legal leverage, and SQL expertise.
This was my first CAR conference, and I was very impressed by the mission-driven enthusiasm with which the speakers, panelists, and attendees focused on data as an essential tool for the pursuit of the truth. I was impressed, but not surprised, to find this passion for data. Journalists have been sifting through data to find the truth since at least the early twentieth century when social work, academic social science, and journalism were all in the same primeval soup1. The modern tradition of computer-assisted data journalism dates at least as far back as Philip Meyer’s coverage of the Detroit Riots, his 1973 book Precision Journalism, and subsequent collaboration with Donald Bartlet and James Steele examining patterns in 1970’s Philadelphia criminal conviction sentences2. I mention all this to emphasize that data journalism is not just a trendy offshoot of data science. In fact, it might be the other way around! Data scientists probably owe as much to data journalists as they do to statisticians.
The number of packed workshops and talks on data wrangling and visualization far exceeded my expectations. I did expect some R content. (I saw the recent BBC post, and New York Times R-based visualizations are a daily part of my news consumption.) But, there were over 15 R-related sessions on the schedule, along with at least as many sessions devoted to Python, SQL, JavaScript, D3, and other programming tools. Moreover, the fact that several featured technical workshops were repeated over multiple days indicated that the conference organizers expected the data journalists to want to dig into the details of all these technologies.
You can get a flavor for the technology presentations by looking into the tip sheets that are available for many of these sessions. There is a wealth of information buried here, well worth a couple of hours of exploration. For example, see Peter Aldhous’ talk Text mining in R with tidytext,
Andrew Ba Tran’s Mapping with R, and the panel discussion How and why to make your data analysis reproducible, and Sharon Machlis’ video series: Do More with R.
A session on statistical inference that I very much enjoyed was Jevin West’s talk Calling bullshit: Data reasoning in a digital world. There is no tip sheet for the talk, but the website for the course that he teaches at the University of Washington with his colleague Carl Bergstrom contains voluminous material. Something like this course ought to be included in every statistics and data science syllabus. In addition to discussing the standard topics, such as attributing cause to correlation and deconstructing misleading visualizations, it also presents several up-to-date cautionary tales: for example, have a look at the case study Criminal Machine Learning.
Some additional tip sheets that I found illuminating for what they reveal about the types of data sources that data journalists seek out are: Tracking dark money tips, 2020 Census Reporting Mistakes, Tips on Finding Nonprofit Data, and Before you ever begin your analysis.
If you are a data journalist, or a data journalist in training, or really anyone new to R, and are looking for a fast on-ramp to becoming productive at data wrangling and creating visualizations, I highly recommend Sharon Machilis book Practical R for Mass Communication and Journalism and Andrew Ba Tran’s tutorial, R For Journalists. Both of these resources are unusual in that they provide up-to-date, tidyverse-based, GitHub-aware introductions to R, stressing data acquisition, manipulation, reporting, and graphing without the burden of having to simultaneously take an introductory course in statistics.
Finally, thanks to Andrew, here is a list of R-fluent journalists whom you may want to follow on Twitter:
- Peter Aldhouse
- Aaron Kessler
- Sharon Machlis
- David Montgomery
- Hannah Recht
- Andrew Ba Tran
- MaryJo Webster
- Christine Zhang
1C.W. Anderson. Apostoles of Certainty: Oxford University Press 2018. Chapter 2
2 Data Journalism, Wikipedia
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.