Site icon R-bloggers

Climbing Mt. Whitney with web browser automation and R

[This article was first published on Sebastian Wolf blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< !-- wp:heading {"level":3} -->

Mount Whitney is the tallest mountain in the contiguous United States and you need a permit to climb it. These permits are limited. But from time to time, somebody will return his permit. It will show up on the permit website recreation.cov. I wanted to get one of those and will tell you how.

< !-- /wp:heading --> < !-- wp:paragraph -->

A friend of mine had a time window of two weeks to get a permit for Mt Whitney. I did not really know about this mountain until he came up with the trip. Mt. Whitney is located in California and 14,505 ft (4,421 m) above sea level. As a lot of people want to go there every year. The USDA Forest Service decided to limit the number of permits to hike the mountain. To get a permit, you simply go to this website and check if your date with the # of hikers is available.

< !-- /wp:paragraph --> < !-- wp:image -->
Mt. Whitney permit website (2019 Oct 3rd)
< !-- /wp:image --> < !-- wp:paragraph -->

You will notice pretty fast, that if you are not an early bird, all permits for your desired date are gone. Now you have three choices. One, resign and don’t hike the mountain. Two, check the website by yourself every day, to see if new or returned permits are available. Three, get a bot or browser automation to work that checks the permits for you. My friend decided to ask me for the third option. And as I have some experience with RSelenium (just recently presented at EARLconf), I wanted to try this approach.

< !-- /wp:paragraph --> < !-- wp:heading {"level":3} -->

The knowledge you need to follow this tutorial

< !-- /wp:heading --> < !-- wp:list --> < !-- /wp:list --> < !-- wp:heading {"level":3} -->

Getting started with RSelenium and docker

< !-- /wp:heading --> < !-- wp:paragraph -->

First I wanted to run my bot on the cloud. Moreover, I wanted to get a reproducible environment. So I decided to follow the vignette approach of RSelenium. This means using a docker container for Selenium. So my first task was to spin up two docker containers. The first one should run Selenium, the second one should run R and python to access it.

< !-- /wp:paragraph --> < !-- wp:paragraph -->

Spinning up the Selenium container is simple:

< !-- /wp:paragraph --> < !-- wp:preformatted -->
docker run -d -p 4445:4444 --name seleniumcontainer --net mynet selenium/standalone-chrome
< !-- /wp:preformatted --> < !-- wp:paragraph -->

I used a shared network between the docker containers called mynet . This allows the two docker containers to find each other in the network, even by names.

< !-- /wp:paragraph --> < !-- wp:paragraph -->

The second docker container must consist of three files.

< !-- /wp:paragraph --> < !-- wp:list {"ordered":true} -->
  1. run_tests.R to execute my RSelenium calls
  2. sendmail.py to send emails from python
  3. DockerFile to build a docker container
< !-- /wp:list --> < !-- wp:paragraph -->

The Dockerfile needs to look like this:

< !-- /wp:paragraph --> < !-- wp:syntaxhighlighter/code {"language":"bash"} -->
# alpine-python-ecmwfapi
FROM rocker/tidyverse:3.6.0
MAINTAINER zappingseb "sebastian@mail-wolf.de"

RUN R -e "install.packages(c('RSelenium'), repos='https://cran.rstudio.com/') "

RUN apt-get update -qq \
  && apt-get install -y \
  python-pip \
  vim

RUN pip install pytest-shutil
RUN pip install --upgrade numpy secure-smtplib email

COPY run_tests.R /tmp/run_tests.R
COPY sendmail.py /tmp/sendmail.py

RUN apt-get update && apt-get -y install cron
RUN echo "0 */12 * * * root Rscript /tmp/run_tests.R" >> /etc/crontab
RUN service cron start
< !-- /wp:syntaxhighlighter/code --> < !-- wp:paragraph -->

I used tidyverse docker container and installed the RSelenium package. Additionally, I installed python secure-smtplib and email . I also already added a cronjob to my docker container. This cronjob will run the web crawler every twelve hours by:

< !-- /wp:paragraph --> < !-- wp:preformatted -->
RUN apt-get update && apt-get -y install cron
RUN echo "0 */12 * * * root Rscript /tmp/run_tests.R" >>   
  /etc/crontab
RUN service cron start
< !-- /wp:preformatted --> < !-- wp:paragraph -->

Now I would like to spin up the docker container. But my sendmail.py and run_tests.R files were missing. Let’s create them

< !-- /wp:paragraph --> < !-- wp:heading {"level":3} -->

Using RSelenium to crawl permits

< !-- /wp:heading --> < !-- wp:paragraph -->

To use RSelenium you first need to connect to the Selenium server. It runs in the other docker container. To connect to it run:

< !-- /wp:paragraph --> < !-- wp:preformatted -->
remDr <- remoteDriver(remoteServerAddr = "seleniumcontainer", browserName = "chrome")
< !-- /wp:preformatted --> < !-- wp:paragraph -->

The name seleniumcontainer will be automatically identified as long as the container runs inside mynet. Two steps will lead to the Mt. Whitney permit website. Opening a browser and navigating to the website:

< !-- /wp:paragraph --> < !-- wp:preformatted -->
remDr$open()
remDr$navigate("https://www.recreation.gov/permits/233260/")
< !-- /wp:preformatted --> < !-- wp:heading {"level":4} -->

Work with the permit form

< !-- /wp:heading --> < !-- wp:paragraph -->

The much harder part is to find the elements to click on. So first I noticed, that I need to click on the option “All Routes”, which was the third one from the dropdown menu:

< !-- /wp:paragraph --> < !-- wp:image -->
Mt Whitney dropdown menu HTML code
< !-- /wp:image --> < !-- wp:paragraph -->

This option can be accessed by its id . This id is division-selection . By clicking on the element with the id , the dropdown will open. After the dropdown is open, you need to click on the 3rd option element available on the website. With these 4 lines of code you can realize it using RSelenium:

< !-- /wp:paragraph --> < !-- wp:preformatted -->
el_1 <- remDr$findElements("id", "division-selection")
< !-- /wp:preformatted --> < !-- wp:preformatted -->
el_1[[1]]$clickElement()
< !-- /wp:preformatted --> < !-- wp:preformatted -->
el_2 <- remDr$findElements("css selector", "option")
< !-- /wp:preformatted --> < !-- wp:preformatted -->
el_2[[3]]$clickElement()
< !-- /wp:preformatted --> < !-- wp:paragraph -->

As you can see findElements returns a list of webElements with the desired attributes. clickElement is a method of such a webElement and will basically click the element.

< !-- /wp:paragraph --> < !-- wp:paragraph -->

This was the easiest part of browser automation steps. The much harder part is entering the number of hikers. The safest way to change them is not only to type into the text field but also to use javascript to change its value. The field number-input- will be used for this.

< !-- /wp:paragraph --> < !-- wp:image -->
Mt Whitney numeric input
< !-- /wp:image --> < !-- wp:paragraph -->

To change the value I used the following code:

< !-- /wp:paragraph --> < !-- wp:preformatted -->
el_3 <- remDr$findElements("id", "number-input-")
< !-- /wp:preformatted --> < !-- wp:preformatted -->
# executing a javascript piece to update the field value
remDr$executeScript("arguments[0].setAttribute('value','1');"), list(el_3[[1]]))
< !-- /wp:preformatted --> < !-- wp:preformatted -->
# clearing the element and entering 1 participant
el_3[[1]]$clearElement()
< !-- /wp:preformatted --> < !-- wp:preformatted -->
el_3[[1]]$sendKeysToElement(list("1"))
< !-- /wp:preformatted --> < !-- wp:paragraph -->

You can clearly see that I wanted one single permit for the mountain. The javascript piece ran on the webElement itself, which was stored in el_3[[1]] . For RSelenium I prefer finding elements with the remDr$findElements method. Afterward, I take the first piece if I am sure that there is just a single element. The methods clearElement and sendKeysToElement remove old values and enter the value needed. The API of sendKeysToElement is a bit weird, as it requires a list of keys, instead of a string. But once used, it is easy to keep your code.

< !-- /wp:paragraph --> < !-- wp:heading {"level":4} -->

Interact with the permit calendar

< !-- /wp:heading --> < !-- wp:paragraph -->

After these steps, the calendar with permits gets activated. I wanted to get a permit in October 2019. So I needed to click on “NEXT” until October shows up.

< !-- /wp:paragraph --> < !-- wp:image -->
Mt Whitney next button
< !-- /wp:image --> < !-- wp:paragraph -->

I build a loop to perform this task using the while command

< !-- /wp:paragraph --> < !-- wp:syntaxhighlighter/code {"language":"r"} -->
# Get the initial month shown
month_elem <- remDr$findElements("css selector", ".CalendarMonth_caption strong")
month <- month_elem[[1]]$getElementText()
# Loop to until the October calendar is shown
while(!grepl("October", month)) {
 
  el_4 <- remDr$findElements("css selector", ".sarsa-day-picker-
    range-controller-month-navigation-button.right")
  el_4[[1]]$clickElement()
  Sys.sleep(1)
  month_elem <- remDr$findElements("css selector", 
    ".CalendarMonth_caption")
  month <- month_elem[[2]]$getElementText()
}
< !-- /wp:syntaxhighlighter/code --> < !-- wp:paragraph -->

The element containing the month was had the tag class="CalendarMonth_caption"><strong>...</ . I accessed this with a CSS selector. Upon clicking the next button, which had a specific CSS class, a new calendar name shows up. It took me a while to find out that the old calendar month is not gone. Now the second element has to be checked for the name. So I overwrite the month variable with the newly shown up heading of the calendar.

< !-- /wp:paragraph --> < !-- wp:heading {"level":4} -->

Derive the first available date from the calendar

< !-- /wp:heading --> < !-- wp:paragraph -->

Finding an available day as a human is simple. Just look at the calendar and search for blue boxes with an A inside:

< !-- /wp:paragraph --> < !-- wp:image -->
Calendar with available day at Mt Whitney
< !-- /wp:image --> < !-- wp:paragraph -->

For a computer, it is not that easy. In my case, I just had one question to answer. What is the first date in October 2019 to climb Mt. Whitney?

< !-- /wp:paragraph --> < !-- wp:syntaxhighlighter/code {"language":"r"} -->
day_elem <- remDr$findElements("css selector", ".rec-available-day")
if (length(day_elem) < 1) {
  earliest_day <- "NODAY"
} else {
  earliest_day <- strsplit(
    day_elem[[1]]$getElementText()[[1]],
    split = "\n")[[1]][1]
}
< !-- /wp:syntaxhighlighter/code --> < !-- wp:paragraph -->

Thus I searched for any entry with the class rec-available-day . In case any entry was there, I got the text of the first one and took all characters before a line-break. This extracted the number of the date. Now, wrap this up and send an email with the date:

< !-- /wp:paragraph --> < !-- wp:syntaxhighlighter/code {"language":"r"} -->
fileConn<-file("/tmp/output.txt")
writeLines(paste0("The earliest day for Mnt Whitney in ", month[[1]], " is: ", earliest_day, "th of October 2019.\n\n-------------------\n"))
close(fileConn)
# Write an email from output.txt with python
system("python /tmp/sendmail.py")
< !-- /wp:syntaxhighlighter/code --> < !-- wp:heading {"level":3} -->

Activating the docker container

< !-- /wp:heading --> < !-- wp:paragraph -->

Once the script was finished I wrapped up all files in my GitHub repository (zappingseb/mtwhitney). They will help me by sending an email every 12 hours. From there I went back to my docker server and git cloned the repository. I could then spin up my docker container by running:

< !-- /wp:paragraph --> < !-- wp:syntaxhighlighter/code -->
docker build --net mynet --name mtwhitney -f Dockerfile .
< !-- /wp:syntaxhighlighter/code --> < !-- wp:paragraph -->

and test the script one time by connecting to the docker container:

< !-- /wp:paragraph --> < !-- wp:syntaxhighlighter/code -->
docker run -it mtwhitney /bin/bash
< !-- /wp:syntaxhighlighter/code --> < !-- wp:paragraph -->

and running the script with:

< !-- /wp:paragraph --> < !-- wp:syntaxhighlighter/code -->
sudo Rscript /tmp/run_tests.R
< !-- /wp:syntaxhighlighter/code --> < !-- wp:paragraph -->

I got an email. After receiving the email I was sure the script will run and disconnected using Ctrl + p and Ctrl + q .

< !-- /wp:paragraph --> < !-- wp:heading {"level":3} -->

Learnings

< !-- /wp:heading --> < !-- wp:paragraph -->

Scripting this piece really got me an email with a free slot and a permit to climb Mt Whitney:

< !-- /wp:paragraph --> < !-- wp:image -->
email from Browser Automation
< !-- /wp:image --> < !-- wp:paragraph -->

Browser automation can be helpful for use cases like this one. It helped me to get a permit. I did not overengineer it by scraping the website every few seconds or checking for specific dates. It was more of a fun project. You can think of a lot of ways to make it better. For example, it could send an email, if a date is available. But I wanted to get the log files every 12 hours, to see if something went wrong.

< !-- /wp:paragraph --> < !-- wp:paragraph -->

During the scraping, the website got updated once. So I received an error message from my scraper. I changed the script to the one presented in this blog post. The script may not work if you want to scrape Mt.Whitney pages tomorrow. Recreation.gov might have changed the website already.

< !-- /wp:paragraph --> < !-- wp:paragraph -->

I use browser tests to make my R-shiny apps safer. I work in a regulated environment and there these tests safe me a lot of time. This time I would have spent click-testing without such great tools as RSelenium or shinytest. Try it out and enjoy your browser doing the job for you.

< !-- /wp:paragraph -->

To leave a comment for the author, please follow the link and comment on their blog: Sebastian Wolf blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.