Hunspell 2.0: High-Performance Stemmer, Tokenizer, and Spell Checker for R

Jeroen Ooms

6 years ago

[This article was first published on rOpenSci Blog - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A new version of the ropensci hunspell package has been released to CRAN. Hunspell is the spell checker library used by LibreOffice, OpenOffice, Mozilla Firefox, Google Chrome, Mac OS-X, InDesign, Opera, RStudio and many others. It provides a system for tokenizing, stemming and spelling in almost any language or alphabet. The R package exposes both the high-level spell-checker as well as low-level stemmers and tokenizers which analyze or extract individual words from various formats (text, html, xml, latex).

New Vignette

This new version now includes a beautiful vignette which gives an overview of the main functionality to get you started! It demonstrates the tokenizer, stemmer and spell-checker and has an example how to use the stemmer and tokenizer to create a word cloud from a large body of text.

Installing and Updating

The package is most easily installed from CRAN:

install.packages("hunspell")

Or to get the latest version from Github:

devtools::install_github("ropensci/hunspell")

This package does not require any system dependencies (libhunspell is now bundled with the package).

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci Blog - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.