Webscraping with R using a Raspberry Pi
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Setting up the Raspberry Pi
After the basic setup, i.e.
- bought a Raspberry Pi Starter Kit
- flashed the SD Card with Raspbian
- ran raspi-config
- installed R with apt-get install R, which installed R 3.1.1
I started to install the R packages usually needed for my cron-job tasks (mostly webscraping). I ran into problems with the rvest package because several packages could not be installed. Maybe there is a more efficient way but I did the following steps:
Install packages for webscraping
To install xml and related R packages (rvest), I needed the libxml2 on the system although apt-get had it, so I manually installed it:
| 123 | wget ftp://xmlsoft.org/libxml2/libxml2-2.9.2.tar.gztar -xzvf libxml2-2.9.2.tar.gzcd libxml2-2.9.2/ | 
I also needed python-dev to make libxml2 compile.
| 12 | sudo apt-get updatesudo apt-get install python-dev | 
Then built libxml2:
| 12 | ./configure --prefix=/usr --disable-static --with-history && makesudo make install | 
I also had problems with the curl Package. Installation suggested to install libcurl4-openssl-dev therefore:
| 1 | sudo apt-get install libcurl4-openssl-dev | 
Last problem was the openssl package. Again, I followed the suggestions from the failed R-package installation and installed libssl-dev:
| 1 | sudo apt-get install libssl-dev | 
After that, rvest installed nicely. However, it took quite a while for the Pi to install all dependencies.
Webscraping Example – A simple frost warning for my plants
A simple Task, my Raspberry Pi is doing for me is sending a frost warning to my email if at 6 pm the weather forecast for the night goes below 3 °C. For this I got an API Key at openweathermap.org. Mind, that openweathermap.org does not like frequent requests (less than 1 per 10 minutes). At the beginning I got blocked.
You can then request some JSON for your city ID using your APPID (API Key):
| 12 | library(jsonlite)wd_json <- fromJSON("http://api.openweathermap.org/data/2.5/forecast/city?id=CITY_ID_GOES_HERE&APPID=YOUR_API_KEY_GOES_HERE") | 
Then tidy and extract the values needed. Temperatures are in degrees kelvin so we need to convert to celsius. The date I transform to POSIX.
| 12345 | wd <- wd_json$listwd$Datum <- as.character(as.POSIXct(wd$dt, origin="1970-01-01", tz="Europe/Berlin"))wd$Celsius_min <- wd$main$temp_min-273.15wd$Celsius_max <- wd$main$temp_max-273.15wd$Celsius_mean <- wd$main$temp-273.15 | 
Sending results via email
Now for the part sending a mail:
| 12345678910111213141516171819202122232425262728293031 | library(sendmailR)library(xtable)wd <- wd[as.POSIXct(Sys.time()+86400)>wd$Datum,]if(any(wd$Celsius_min < 3)) {  dispatch <- print(xtable(wd[wd$Celsius_min<3,c("Datum","Celsius_min","Celsius_mean","Celsius_max")]),type="html")  msg <- mime_part(paste0('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0                          Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">                          <html xmlns="http://www.w3.org/1999/xhtml">                          <head>                          <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />                          <meta name="viewport" content="width=device-width, initial-scale=1.0"/>                          <title>HTML demo</title>                          <style type="text/css">                          </style>                          </head>                          <body><h2>Frostwarnung</h2>',                          dispatch,                          '</body>                          </html>'))  ## Override content type.  msg[["headers"]][["Content-Type"]] <- "text/html"  from <- sprintf("<sendmailR@%s>", Sys.info()[4])  to <- "<YOUR@EMAIL_GOES_HERE.COM>"  subject <- paste("Frostwarnung",date())  body    <- list(msg)  sendmail(from, to, subject, body,control=list(smtpServer="ASPMX.L.GOOGLE.COM")) | 
Finally we have to tell the Raspberry Pi to schedule the script to run daily at early evening. Save the .R file and add it to your crontab:
| 1 | crontab -e | 
The first time you use crontab you are asked to choose an editor. Easiest (at least for me) to use is nano.
Add the following line:
| 1 | 00 18 * * * Rscript ~/path_to_your/script.R | 
Which will add the script to your cronjobs scheduling it at 18:00 every day and month.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
