High frequency words in TOEFL
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In general, TOEFL(Test of English as a Foreign Language) is not an easy test for Chinese students, including me. Relatively speaking, the reading section is little easier than the other sections (listening, speaking, writing). Interestingly, when I prepared my TOEFL test, I found that some important words appeared frequently in the mock examination. So I did a simple experiment this night just out of my curiosity. First I picked some relevant materials from Internet (Google covered). And then I did some basic transformations such as converting to plain text documents, eliminating extra whitespace, converting to lower case, remove stopwords and so on. Actually it can be completed easily in R, just based on package tm. Obviously tm is an excellent and significant package in text manipulation. After this step, package wordcloud enable us to plot a word cloud effortlessly. The result is as follows,
And the main codes are shown bellow,
library(tm); library(wordcloud); txt<-"E:\\TOEFL"; b<-Corpus(DirSource(txt),readerControl=list(language="eng")); b<-tm_map(b,stripWhitespace); b<-tm_map(b,removePunctuation); b<-tm_map(b,tolower); b<-tm_map(b,removeWords,c("and","the")); b<-tm_map(b,removeWords,c("may","can")); b<-tm_map(b,removeWords,c("also","often","one")); b<-tm_map(b,removeWords,stopwords("english")); tdm<-TermDocumentMatrix(b); m1<-as.matrix(tdm); v1<-sort(rowSums(m1),decreasing=TRUE); d1<-data.frame(word =names(v1),freq=v1); par(bg="lightyellow"); set.seed(10); wordcloud(d1$word, d1$freq, scale=c(4,0.8), min.freq=6,max.words=100, col=rainbow(length(d1$freq)),font=2);
By the way, this article is just for fun. Please do not consult this when you prepare you test. Actually the result is also not satisfied, because I did not finish some advanced process, such as tense, singular&plural. Finally, hope all of the students who are dying to study abroad gets a satisfied score in TOEFL.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.