[This article was first published on R-Chart, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Just had a new package published on CRAN…rtrie :).Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The rtrie package allows you to quickly create Tries from a list of strings. Example use cases for the package are word searching in games like Boggle or Scrabble or implementation of applications that autocomplete or autocorrect text. Incidentally, besides being great for statistics, R is an excellent language for building algorithms and data structures from a pedagogical standpoint. The language’s Scheme influence makes the expression of algorithms straightforward and the plotting capabilities make it easy to visualize the data structures and processing involved. The remainder of this post is intended to illustrate that fact as well as introduce rtrie’s capabilities.
To create a new trie from a vector of words:
> library(rtrie)
> trie <- char_tree(c(‘able’, ‘act’, ‘acts’, ‘across’, ‘act’, ‘bat’, ‘babble’, ‘bobble’), ‘X’)
The new trie is a list of lists… with each node being a letter. So the list of lists above can be plotted using the data.tree package. This package has a wide range of tree related processing functions… including functionality related to all of the plots shown in this post.
> library(data.tree)
> data_trie <– FromListSimple(trie)
> plot(data_trie)
The trie can also be displayed in text form which can be read left to right.
data_trie
> data_trie
levelName
1 Root
2 ¦–a
3 ¦ ¦–b
4 ¦ ¦ °–l
5 ¦ ¦ °–e
6 ¦ °–c
7 ¦ ¦–r
8 ¦ ¦ °–o
9 ¦ ¦ °–s
10 ¦ ¦ °–s
11 ¦ °–t
12 ¦ °–s
13 °–b
14 ¦–a
15 ¦ ¦–b
16 ¦ ¦ °–b
17 ¦ ¦ °–l
18 ¦ ¦ °–e
19 ¦ °–t
20 °–o
21 °–b
22 °–b
23 °–l
24 °–e
To check if a given word is in the tree, call is_word which returns a boolean indicating whether or not the word was found in the trie.
> is_word(‘across’, trie)
[1] TRUE
The following diagram illustrates where the match was made in the trie.
To obtain a list of suggested words based on a prefix, call matching_words which returns a vector of matching words.
> matching_words(‘ac’, trie)
r.o.s.s t t.s
“across” “act” “acts”
Three words are returned, corresponding to the section of the tree where the prefix exists. The section of the tree identified by the function is shown below. Although not clear in the diagram, a termination character exists that allowed the function to return both “act” and “acts”.
The highlighted graphs above were created by using data.tree’s SetGraphStyle, SetNodeStyle, and SetEdgeStyle functions.
If you are interested in more information, the code is available on Github and the vignette includes additional discussion.
To leave a comment for the author, please follow the link and comment on their blog: R-Chart.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.