Job posting analysis
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Recently, there was a post on medium about the use of Natural Language Processing (NLP) to study a job posting for keywords. I found that this article was very similar to R shiny App that I created a while ago. 1
Introduction
Technology has changed the job application process, making it easier and quicker to apply to jobs. As a result, the average job posting will receive around 250 resumes. 2 So how can hiring managers handle spending their time looking through that many resumes for one posting? That’s easy, they cheat.
Hiring Managers no longer look at individual resumes, but use automatic software called applicant tracking system (ATS). These programs filter resumes by a set of keywords, reducing the amount of resumes to a more manageable amount. So how can a job applicant make sure their resume is looked at? Well, they should cheat.
The medium article I mentioned uses Python and Natural Language Processing (NLP) to skim through the job posting to look for the most common words used. This is useful information, but not necessarily the keywords used by the ATS software. I propose the use of an R Shiny App to filter a job posting by a list of common keywords.
An R Shiny App is an interactive web based application that runs R code. The syntax for a Shiny App is a little different from R and requires some additional understanding. The product will be a basic, interactive program that can be hosted online. One free Shiny App hosting site that I recommend is shinyapps.io.
Inialization
The shiny App will require the following libraries.
library(shiny) library(wordcloud2) library(tidyverse) library(XML) library(rvest) library(tidytext)
The Shiny App will use a csv files which contains a set of keywords that ATS will look for. This list was found online, but I have modified by adding additional keywords as I see fit. The file can be downloaded here from my GitHub site. Here is a sample of some keywords:
Keywords <- read_csv("Keywords.csv") Keywords$Keys %>% head()
[1] ".NET" "account management" "accounting" [4] "accounts payable" "accounts receivable" "acquisition"
App Structure
One issue I found when developing this application was the use of keywords that are a combination of multiple words. This creates some complications, as a simple search of keywords would use only the first word and lose the context.
This challenge was met by breaking the website down into ngrams. An over simplification of a ngram is a group of n number of words. Wikipedia has a very good page that better explains ngrams.3 The website can then be split into ngrams of different lengths and the keywords searched for.
As a example, the phrase:
The quick brown fox
for a ngram of length 1 would return:
(The) (quick) (brown) (fox)
for a ngram of length 2 would return:
(The quick) (quick brown) (brown fox)
and for a ngram of length 3 would return:
(The quick brown) (quick brown fox)
Application Coding
shinyApp( #This is the standard format for a shiny app #The UI function controls all the frontend for the app ui = fluidPage( titlePanel("Job Posting Word Cloud"), sidebarLayout( sidebarPanel( #The user is asked for a url textInput("url", "input URL", value = "https://www.google.com/") ), mainPanel( #The word cloud plot is displayed h4("Key-Word Cloud"), wordcloud2Output("plot") ) ) ), #The server function controls the backend for the app server = function(input, output){ #The keywords are loaded and an index of how many words per keyword is created Keywords <- read_csv("Keywords.csv") Keywords$Keys <- str_to_lower(Keywords$Keys) index <- Keywords$Keys %>% str_count(" ") #The { brackets are used to create reactive functions which continuously run data <- reactive({ #The input variable is how the server side receives data from the ui side url <- input$url #The text is read from the url provide by the user data <- text %>% data.frame(text = .) #Since there are ngrams of length 1-3, there a three search's that are concatenated together rbind(data %>% #the unnest_tolkens from the tidytext library is used to create the ngrams unnest_tokens(word, text, token = "ngrams", n= 1) %>% #A count is performed on each ngram in the website to find the most common ngrams count(word, name = 'freq', sort = TRUE) %>% #The ngram count is then filtered by the keywords of the same ngram length filter(word %in% Keywords$Keys[index == 0]), #The steps are repeated for bigrams (ngrams of length 2) and trigrams(ngrams of length 3) data %>% unnest_tokens(word, text, token = "ngrams", n= 2) %>% count(word, name = 'freq', sort = TRUE) %>% filter(word %in% Keywords$Keys[index == 1]), data %>% unnest_tokens(word, text, token = "ngrams", n= 3) %>% count(word, name = 'freq', sort = TRUE) %>% filter(word %in% Keywords$Keys[index == 2])) }) #The plot/wordcloud needs to be saved as an output value #The output variable is how the server sends data back to the UI output$plot <- renderWordcloud2({ #One part of the strange syntax of a shiny app is that the since the data is reactive #and changes with the user input, it is passed in a function so it needs to be called #as data () wordcloud2(data()) }) }, options = list(height = 500) )
Shiny App
Footnotes
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.