[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Occupations classification is an important step in tasks such as labour market analysis, epidemiological studies and official statistics. To assist research on the labour market, ESCO has defined a taxonomy for occupations. Occupations are specified and organized in a hierarchical structure based on the International Standard Classification of Occupations (ISCO). Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
labourR
is a new package that performs occupations coding for multilingual free-form text (e.g. a job title) using the ESCO hierarchical classification model.The initial motivation was to retrieve the work experience history from a Curriculum Vitae generated from the Europass online CV editor. Document vectorization is performed using the ESCO model and a fuzzy match is allowed with various string distance metrics.
The
labourR
package:- Allows classifying multilingual free-text using the ESCO-ISCO hierarchy of occupations.
- Computations are fully vectorized and memory efficient.
- Includes facilities to assist research in information mining of labour market data.
Installation
You can install the released version of labourR from CRAN with,install.packages("labourR")
library(labourR) corpus <- data.frame( id = 1:3, text = c("Data Scientist", "Junior Architect Engineer", "Cashier at McDonald's") )
num_leaves
specifies the number of ESCO occupations used to perform a plurality vote,classify_occupation(corpus = corpus, isco_level = 3, lang = "en", num_leaves = 5) #> id iscoGroup preferredLabel #> 1: 1 251 Software and applications developers and analysts #> 2: 2 214 Engineering professionals (excluding electrotechnology) #> 3: 3 523 Cashiers and ticket clerksFor further information browse the vignettes.
labourR 1.0.0: Automatic Coding of Occupation Titles was first posted on July 24, 2020 at 7:07 am.
©2020 "R-posts.com". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at < !--email_off-->tal.galili@gmail.com< !--/email_off-->
To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.