[This article was first published on Category: R | Huidong Tian's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Today, I visited a webpage inadvertently and found several job positions that I am competent with, unfortunately all of them has expired. How many chances we lost in this way?! So I decide to do somthing to limit this kind of loss, and of course using our smart R!
The idea is simple: check the job vacancy webpages reguarly, if find some positions open the webpages or/and send an notice to my email.
< !--more-->Let’s take the vacancy page of Department of Biosciences, UiO as an example. The webpage contains the positions have not expired, for this kind of webpage, we can use the following code:
< notextile>Download Webpage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | workspace <- “C:/Users” file_outdate <- paste(workspace, “outdate.html”, sep = “/”) file_updated <- paste(workspace, “updated.html”, sep = “/”) URL <- “http://www.mn.uio.no/ibv/english/about/vacancies/”</p> <p>if (file.exists(file_outdate)) { download.file(URL, file_updated) } else { download.file(URL, file_outdate) download.file(URL, file_updated) }</p> <p>html_outdate <- readLines(file_outdate) html_updated <- readLines(file_updated)</p> <p> |
< notextile>Extract Position Titles
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | </p> <p>Items <- function(str = html_outdate) { # Regular expression; ptn <- “item-title.+?>(.+?)</a>” HTML_Date <- grep(ptn, str, value = TRUE) # First time to use sapply by setting FUN as “[”, cool! sapply(regmatches(HTML_Date, regexec(ptn, HTML_Date)), “[”, 2) }</p> <h1 id="new-position-available-or-not">New position available or not;</h1> <p>boo <- any(!Items(str = html_updated) %in% Items(str = html_outdate))</p> <h1 id="remove-the-html-file-out-of-date">Remove the html file out of date;</h1> <p>file.remove(file_outdate) file.rename(file_updated, file_outdate)</p> <p> |
< notextile>Display and Send Email
1 2 3 4 5 6 7 | if (boo) { browseURL(URL) library(mail) # Need to install this package first; sendmail(“you@gmail.com”, subject= “Vancancy”, message = URL) }</p> <p> |
The difficult part is to assemble the regular expression, and I have writen a tutorial on that topic. The last step is to run above code in a batch mode.
To leave a comment for the author, please follow the link and comment on their blog: Category: R | Huidong Tian's Blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.