[This article was first published on Valerio Gherardi, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Summary
Version v0.1.2 of my R package kgrams was just accepted by CRAN. This package provides tools for training and evaluating k-gram language models in R, supporting several probability smoothing techniques, perplexity computations, random text generation and more.
Short demo
library(kgrams) # Get k-gram frequency counts from Shakespeare's "Much Ado About Nothing" freqs <- kgram_freqs(kgrams::much_ado, N = 4) # Build modified Kneser-Ney 4-gram model, with discount parameters D1, D2, D3. mkn <- language_model(freqs, smoother = "mkn", D1 = 0.25, D2 = 0.5, D3 = 0.75) # Sample sentences from the language model at different temperatures set.seed(840) sample_sentences(model = mkn, n = 3, max_length = 10, t = 1) [1] "i have studied eight or nine truly by your office [...] (truncated output)" [2] "ere you go : <EOS>" [3] "don pedro welcome signior : <EOS>" sample_sentences(model = mkn, n = 3, max_length = 10, t = 0.1) [1] "i will not be sworn but love may transform me [...] (truncated output)" [2] "i will not fail . <EOS>" [3] "i will go to benedick and counsel him to fight [...] (truncated output)" sample_sentences(model = mkn, n = 3, max_length = 10, t = 10) [1] "july cham's incite start ancientry effect torture tore pains endings [...] (truncated output)" [2] "lastly gallants happiness publish margaret what by spots commodity wake [...] (truncated output)" [3] "born all's 'fool' nest praise hurt messina build afar dancing [...] (truncated output)"
NEWS
Overall Software Improvements
- The package’s test suite has been greatly extended.
- Improved error/warning conditions for wrong arguments.
- Re-enabled compiler diagnostics as per CRAN policy (#19)
API Changes
verbose
arguments now default toFALSE
.probability()
,perplexity()
andsample_sentences()
are restricted to accept onlylanguage_model
class objects as theirmodel
argument.
New features
as_dictionary(NULL)
now returns an emptydictionary
.
Bug Fixes
- Fixed bug causing
.preprocess
and.tknz_sent
arguments to be ignored inprocess_sentences()
. - Fixed previously wrong defaults for
max_lines
andbatch_size
arguments inkgram_freqs.connection()
. - Added print method for class
dictionary
. - Fixed bug causing invalid results in
dictionary()
with batch processing and non-trivial size constraints on vocabulary size.
Other
- Maintainer’s email updated
To leave a comment for the author, please follow the link and comment on their blog: Valerio Gherardi.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.