Character Functions (Advanced)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This set of exercises will help you to help you improve your skills with character functions in R. Most of the exercises are related with text mining, a statistical technique that analyses text using statistics. If you find them interesting I would suggest checking the library tm
, this includes functions designed for this task. There are many applications of text mining, a pretty popular one is the ability to associate a text with his or her author, this was how J.K.Rowling (Harry potter author) was caught publishing a new novel series under an alias. Before proceeding, it might be helpful to look over the help pages for the nchar
, tolower
, toupper
, grep
, sub
and strsplit
. Take at the library stringr
and the functions it includes such as str_sub
.
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Before starting the set of exercises run the following code lines :
if (!'tm' %in% installed.packages()) install.packages('tm')
library(tm)
txt = system.file("texts", "txt", package = "tm")
ovid = VCorpus(DirSource(txt, encoding = "UTF-8"),
readerControl = list(language = "lat"))
OVID = c(data.frame(text=unlist(TEXT), stringsAsFactors = F))
TEXT = lapply(ovid[1:5], as.character)
TEXT1 = TEXT[[4]]
Exercise 1
Delete all the punctuation marks from TEXT1
Exercise 2
How many letters does TEXT1 contains?
Exercise 3
How many words does TEXT1 contains?
Exercise 4
What is the most common word in TEXT1?
Exercise 5
Get an object that contains all the words with at least one capital letter (Make sure the object contains each word only once)
Exercise 6
Which are the 5 most common letter in the object OVID
?
Exercise 7
Which letters from the alphabet are not in the object OVID
Exercise 8
On the OVID
object, there is a character from the popular sitcom ‘FRIENDS’ , Who is he/she? There were six main characters (Chandler, Phoebe, Ross, Monica, Joey, Rachel)
Exercise 9
Find the line where this character is mentioned
Exercise 10
How many words finish with a vowel, how many with a consonant?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.