Scraping Web Pages With R

One of the things I tend to avoid doing in R, partly because there are better tools elsewhere, is screenscraping. With the release of the new rvest package, I thought I’d have a go at what amounts to one of the simplest webscraping activites – grabbing HTML tables out of webpages. The … Continue reading

What’s the Point of an API?

Trying to clear my head of code on a dog walk after a couple of days tinkering with the nomis API and I started to ponder what an API is good for. Chris Gutteridge and Alex Duttion’s open data excuses bingo card and Owen Boswarva’s Open Data Publishing Decision Tree both suggest that … Continue reading

So What Can Text Analysis Do for You?

Despite believing we can treat anything we can represent in digital form as “data”, I’m still pretty flakey on understanding what sorts of analysis we can easily do with different sorts of data. Time series analysis is one area – the pandas Python library has all manner of … Continue reading

Calculating Churn in Seasonal Leagues

One of the things I wanted to explore in the production of the Wrangling F1 Data With R book was the extent to which I could draw on published academic papers for inspiration in exploring the the various results and timing datasets. In a chapter published earlier this week, I explored the notion of … Continue reading