Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
sha256 1 809e2e2a3967742faea6f9e11e0a4c533511f9710ac41812dcbcae3c78913cac< aside> Shikokuchuo
Use case
Whenever you need to programmatically drive a web browser.
Most often:
- to scrape information behind a login screen
- when the http server does not return a simple html document
Initial setup
Prerequisites: JRE or JDK installed on your system, Mozilla Firefox
- Install the RSelenium package from CRAN:
install.packages("RSelenium")
Download selenium-server-standalone-4.0.0-alpha-2.jar (or whatever is the latest ‘selenium-server-standalone’ file)
Download the latest Mozilla geckodriver release, and place in same directory as the jar file
Running Selenium Webdriver
At the terminal, first cd to the directory where your two new files are saved, then run:
java -jar selenium-server-standalone-4.0.0-alpha-2.jar
The selenium server must be up and running before attempting to execute the R code below.
RSelenium quickstart code
library(RSelenium) library(keyring) library(rvest) library(magrittr) # Start Selenium Session remDr <- remoteDriver( remoteServerAddr = "localhost", port = 4444L, browserName = "firefox" ) remDr$open() # Navigate to login page remDr$navigate("https://website.com/login") Sys.sleep(5) # Give page time to load # Find 'username' element and send 'saved_user' as input webElem1 <- remDr$findElement(using = "xpath", "//input[@name = 'username']") webElem1$sendKeysToElement(list(key_get("saved_user"))) # Find 'password' element and send 'saved_pass' and 'enter' keystroke as input webElem2 <- remDr$findElement(using = "xpath", "//input[@name = 'password']") webElem2$sendKeysToElement(list(key_get("saved_pass"), key = "enter")) Sys.sleep(5) # Give page time to load # Navigate to desired page and download source remDr$navigate("https://website.com/somepage") Sys.sleep(5) # Give page time to load html <- remDr$getPageSource()[[1]] %>% read_html() # Use further rvest commands to extract required data # ... # End Selenium Session remDr$close()< aside>
Customise the URLs as required.
Customise the xpath to locate the desired input fields as they are actually named on your site.
‘saved_user’ and ‘saved_pass’ are values already stored using the keyring package and retrieved here using the ‘key_get’ command. It is never a good idea to store plain text credentials in an R script.
Reference
Basic vignette: https://docs.ropensci.org/RSelenium/articles/basics.html
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.