Webscraping with R using a Raspberry Pi
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Setting up the Raspberry Pi
After the basic setup, i.e.
- bought a Raspberry Pi Starter Kit
- flashed the SD Card with Raspbian
- ran
raspi-config
- installed R with
apt-get install R
, which installed R 3.1.1
I started to install the R packages usually needed for my cron-job tasks (mostly webscraping). I ran into problems with the rvest
package because several packages could not be installed. Maybe there is a more efficient way but I did the following steps:
Install packages for webscraping
To install xml and related R packages (rvest), I needed the libxml2 on the system although apt-get had it, so I manually installed it:
123 | wget ftp://xmlsoft.org/libxml2/libxml2-2.9.2.tar.gztar -xzvf libxml2-2.9.2.tar.gzcd libxml2-2.9.2/ |
I also needed python-dev to make libxml2 compile.
12 | sudo apt-get updatesudo apt-get install python-dev |
Then built libxml2:
12 | ./configure --prefix=/usr --disable-static --with-history && makesudo make install |
I also had problems with the curl Package. Installation suggested to install libcurl4-openssl-dev therefore:
1 | sudo apt-get install libcurl4-openssl-dev |
Last problem was the openssl package. Again, I followed the suggestions from the failed R-package installation and installed libssl-dev:
1 | sudo apt-get install libssl-dev |
After that, rvest
installed nicely. However, it took quite a while for the Pi to install all dependencies.
Webscraping Example – A simple frost warning for my plants
A simple Task, my Raspberry Pi is doing for me is sending a frost warning to my email if at 6 pm the weather forecast for the night goes below 3 °C. For this I got an API Key at openweathermap.org. Mind, that openweathermap.org does not like frequent requests (less than 1 per 10 minutes). At the beginning I got blocked.
You can then request some JSON for your city ID using your APPID (API Key):
12 | library(jsonlite)wd_json <- fromJSON("http://api.openweathermap.org/data/2.5/forecast/city?id=CITY_ID_GOES_HERE&APPID=YOUR_API_KEY_GOES_HERE") |
Then tidy and extract the values needed. Temperatures are in degrees kelvin so we need to convert to celsius. The date I transform to POSIX.
12345 | wd <- wd_json$listwd$Datum <- as.character(as.POSIXct(wd$dt, origin="1970-01-01", tz="Europe/Berlin"))wd$Celsius_min <- wd$main$temp_min-273.15wd$Celsius_max <- wd$main$temp_max-273.15wd$Celsius_mean <- wd$main$temp-273.15 |
Sending results via email
Now for the part sending a mail:
12345678910111213141516171819202122232425262728293031 | library(sendmailR)library(xtable)wd <- wd[as.POSIXct(Sys.time()+86400)>wd$Datum,]if(any(wd$Celsius_min < 3)) { dispatch <- print(xtable(wd[wd$Celsius_min<3,c("Datum","Celsius_min","Celsius_mean","Celsius_max")]),type="html") msg <- mime_part(paste0('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0"/> <title>HTML demo</title> <style type="text/css"> </style> </head> <body><h2>Frostwarnung</h2>', dispatch, '</body> </html>')) ## Override content type. msg[["headers"]][["Content-Type"]] <- "text/html" from <- sprintf("<sendmailR@%s>", Sys.info()[4]) to <- "<YOUR@EMAIL_GOES_HERE.COM>" subject <- paste("Frostwarnung",date()) body <- list(msg) sendmail(from, to, subject, body,control=list(smtpServer="ASPMX.L.GOOGLE.COM")) |
Finally we have to tell the Raspberry Pi to schedule the script to run daily at early evening. Save the .R file and add it to your crontab:
1 | crontab -e |
The first time you use crontab you are asked to choose an editor. Easiest (at least for me) to use is nano.
Add the following line:
1 | 00 18 * * * Rscript ~/path_to_your/script.R |
Which will add the script to your cronjobs scheduling it at 18:00 every day and month.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.