[This article was first published on Quantifying Memory, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< !-- saved from url=(0014)about:internet --> Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Wordclouds such as Wordle are pretty rubbish, so I thought I’d try to make a better one, one that actually produces (statistically) meaningful results. I was so happy with the outcome I decided to make it interactive, so go on, have a play!
Compare any two
The sample image included to the left shows differences between my undergraduate thesis about Richard Pipes as a figure or ridicule in Rusian media (on the left) and my mphil theses about Katyn in Polish and Russian media (on the right). I think the plot makes the differences in emphasis pretty obvious. The words in light blue in the middle are terms featuring strongly in both texts and which are not significantly more present in one or the other.
I’ve presented the code and the logic behind the application elsewhere, so here I include only basic instructions: select two files to compare. Comparisons work best for medium sized files – too small and there will be no differences, too large and processing time will become a bottleneck. If trying to do anything big I strongly recommend compiling the R script locally.
Any language should work, but you may need to find your own stoplist (and stem it!) to get meaningful results. My Russian stop list may be downloaded from here. UPDATE: the Russian stoplist has been hardcoded into the app. Native support for English and I think German also exists, but for other languages you will need to recompile the programme with a custom made stoplist.
I’ve embedded the app below, but a more userfriendly version can be acccessed here
UPDATE: file upload is not working at the moment, so text needs to be pasted in. This will only work for small to medium size documents.
To leave a comment for the author, please follow the link and comment on their blog: Quantifying Memory.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.