Text Mining to Word Cloud App with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here is a simple application to transform text into a beautiful word cloud, Text Mining to WordCloud. The purpose is to find out the highest frequency word in a certain text. It is an app built with R language, the source code is attached at the end of the post.
For example, we want to know the words frequency of the “I have a Dream” by Martin Luther King, Jr :
And so even though we face the difficulties of today and tomorrow, I still have a dream. It is a dream deeply rooted in the American dream.
I have a dream that one day this nation will rise up and live out the true meaning of its creed:
We hold these truths to be self-evident, that all men are created equal.
I have a dream that one day on the red hills of Georgia, the sons of former slaves and the sons of former slave owners will be able to sit down together at the table of brotherhood.
I have a dream that one day even the state of Mississippi, a state sweltering with the heat of injustice, sweltering with the heat of oppression, will be transformed into an oasis of freedom and justice.
I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character.
I have a dream today!
I have a dream that one day, down in Alabama, with its vicious racists, with its governor having his lips dripping with the words of interposition and nullification — one day right there in Alabama little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers.
I have a dream today!
I have a dream that one day every valley shall be exalted, and every hill and mountain shall be made low, the rough places will be made plain, and the crooked places will be made straight; and the glory of the Lord shall be revealed and all flesh shall see it together.
This is our hope, and this is the faith that I go back to the South with.
With this faith, we will be able to hew out of the mountain of despair a stone of hope. With this faith, we will be able to transform the jangling discords of our nation into a beautiful symphony of brotherhood. With this faith, we will be able to work together, to pray together, to struggle together, to go to jail together, to stand up for freedom together, knowing that we will be free one day.
And this will be the day — this will be the day when all of God’s children will be able to sing with new meaning:
My country ‘tis of thee, sweet land of liberty, of thee I sing.
Land where my fathers died, land of the Pilgrim’s pride,
From every mountainside, let freedom ring!
And if America is to be a great nation, this must become true.
And so let freedom ring from the prodigious hilltops of New Hampshire.
Let freedom ring from the mighty mountains of New York.
Let freedom ring from the heightening Alleghenies of Pennsylvania.
Let freedom ring from the snow-capped Rockies of Colorado.
Let freedom ring from the curvaceous slopes of California.
But not only that:
Let freedom ring from Stone Mountain of Georgia.
Let freedom ring from Lookout Mountain of Tennessee.
Let freedom ring from every hill and molehill of Mississippi.
From every mountainside, let freedom ring.
And when this happens, when we allow freedom ring, when we let it ring from every village and every hamlet, from every state and every city, we will be able to speed up that day when all of God’s children, black men and white men, Jews and Gentiles, Protestants and Catholics, will be able to join hands and sing in the words of the old Negro spiritual:
Free at last! Free at last!
Thank God Almighty, we are free at last!
We just need to copy the text above (default text) into the textarea and click “Execute”.
You will get the result as below:
You can observe that “Freedom” is the most stated word!
However, there are some restrictions to use it:
- Double Quote mark ([b]”[/b]) is not allowed.
- Only Latin alphabet, i.e. English text allowed.
Source code:
A. Backend (Code)
[—hidestart—]
library(plyr)
library(stringr)
library(ggplot2)
library(doBy)
library(RColorBrewer)
require(gdata)
require(tm)
require(wordcloud)
oritext <- “[—||TEXT||—]”
[—hideend—]
plot(1)
[—hidestart—]
RemoveAtPeople <- function(text2wc) {
gsub(“@\w+”, “”, text2wc)
}
#Then for example, remove @d names
text2wcs <- as.vector(sapply(oritext, RemoveAtPeople))
generateCorpus= function(df,my.stopwords=c()){
text2.corpus= Corpus(VectorSource(df))
text2.corpus = tm_map(text2.corpus, removePunctuation)
text2.corpus = tm_map(text2.corpus, tolower)
text2.corpus = tm_map(text2.corpus, removeWords, stopwords(“english”))
text2.corpus = tm_map(text2.corpus, removeWords, my.stopwords)
text2.corpus
}
pal2 <- brewer.pal(8,”Dark2”)
wordcloud.generate=function(corpus,min.freq=2){
doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
dm = as.matrix(doc.m)
# calculate the frequency of words
v = sort(rowSums(dm), decreasing=TRUE)
d = data.frame(word=names(v), freq=v)
#Generate the wordcloud
wc=wordcloud(d$word, d$freq, min.freq=min.freq, colors=pal2)
wc
}
print(wordcloud.generate(generateCorpus(text2wcs,”dev8d”),2))
#
[—hideend—]
B. Fronted (Application)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.