Site icon R-bloggers

Recording Weather.gov History using R and the Windows Task Scheduler

[This article was first published on George J. Mount, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One of my most popular posts is on retrieving real-time weather into Excel.

A common question here is how to get recorded weather updates; that is, keep a log of the weather updates for a specific place.

For that I will write an R script to parse weather.gov’s XML feed for a location and then use the task scheduler to automate going to the site, pulling the weather info & appending to our historical records.

Let’s get started. First time using R? Check out my free course, “5 Things Excel Users Should Know About R.”

1. Read our XML site

We will use the xml2 library to read in our .xml feed from the web and parse the various strings we are interested in. This is similar to what we did with WEBSERVICE and FILTERXML in our Excel example.

First, let’s load the xml2 library and read in our site. I will be using the feed from the Akron airport for this example.

I am also going to create a blank data frame called “history.” You’ll see why in a minute:

library(xml2)

#first time set a blank history data frame
history <- c()

#read in the web file
weather <- read_xml("https://w1.weather.gov/xml/current_obs/KAKR.xml")

2. Parse our nodes of interest

Next we will parse our nodes of interest using the xml_find_first function. I will use XPath to select these nodes – if you remember from our previous post, these will always start with “.//” and then the tag.

After that, we use xml_text and related functions to extract the information. Because our temperature variable is numeric, I use xml_double instead.

#get the temperature  
temp_f <- xml_find_first(weather, ".//temp_f")
temp_f <- xml_double(temp_f)

#get the weather description
temp_desc <- xml_find_first(weather, ".//weather")
temp_desc <- xml_text(temp_desc)

#get the observation times
obs_time <- xml_find_first(weather, ".//observation_time_rfc822")
obs_time <- xml_text(obs_time)

3. Merge the real-time update with your historical records

Now you will see why we set up the blank data frame above. Each time we run this script we will want to append our new update to our historical records. Of course at this point we don’t have any history, so the data frame is blank. That will change after this, and we will modify our code accordingly.

For now, though, I will combine our three pieces of real-time information using the cbind function (which binds columns), then I’ll append this to our historical records using rbind (which binds rows).

At this point you can write the file to a .csv file if you’d like.

#put this info together
realtime <- cbind(temp_f, temp_desc, obs_time)

#merge the realtime and historical
historyupdated <- rbind(history, realtime)


#write to a csv if you want
write.csv(historyupdated, "C:/RFiles/historyupdate.csv",
          row.names = FALSE)

4. Save the R data and set it up to load next time 

We want to save this log as an R file to read up as our historical file next time. To do this I will save an RDS file (this is R’s internal data record file extension).

So next time instead of starting up with a blank historical data frame, we’ll read this log in instead.

#RESAVE THE FILE
saveRDS(historyupdated, "C:/RFiles/weatherlog.rds")

#AFTER THAT, load up the history
history <- readRDS("C:/RFiles/weatherlog.rds")

See the full code below to get a better sense of how this works.

One last thing…  Save your R Script somewhere convenient and simple. For example I have mine in C:\RScripts. This will make the next step easier.

5. Automate retrieval with the task scheduler

The above script is pretty cool but requires the user to run it at regular times, which would get tedious.

Instead we are going to use the Task Scheduler in Windows to automate running this code.

Find the Task Scheduler using your Windows search bar and head up to Action | Create Task.

Name the task what you’d like. I’d also suggest you set to “Run with highest privileges.” Remember that this script will only run when you are logged onto your computer (unless you say otherwise) and will certainly not run if the computer is off.

Next, go to Triggers. We are going to automate this script to run every hour, which is how often the weather is updated at this location. I will set the task to repeat every hour for an indefinite duration.

Almost there! This next part takes some practice. Go to Actions and create a new action.

Here you first select where your Rscript.exe file resides on your computer. For me it is C:\Program Files\R\R-3.4.3\bin\j386\Rscript.exe. Whew! Fortunately you can browse to this location.

Under Add arguments, type the name of the R script from above. Mine is named WeatherLog.R.

Now here is why I suggested you keep the file somewhere simple. Under Start in, you will put the folder where this file is located.

Now the weather comes to you.

So long as nothing in your system changes, you will get weather updates on the hour delivered to your CSV file. From here you could feed this into a workbook using Get & Transform. Bonus — from here you can use the Query Editor to transform the observation time column to your preferred format.

 

To leave a comment for the author, please follow the link and comment on their blog: George J. Mount.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.