Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We’re very pleased to announce the release of qdap 1.3.1
This is the latest installment of the qdap package available at CRAN. Several important updates have occurred since the 1.1.0 release, most notable the addition of two vignettes and some generic view methods.
The new vignettes include:
The former is a detailed HTML based guide over viewing the intended use of qdap functions. The second vignette is an explanation of how to move between qdap and tm package forms as qdap moves to be more compatible with this seminal R text mining package.
To install use:
install.packages(“qdap”)
Some of the changes in versions 1.2.0-1.3.1 include:
Generic Methods
scores
generic method added to view scores from select qdap objects.counts
generic method added to view counts from select qdap objects.proportions
generic method added to view proportions from select qdap objects.preprocessed
generic method added to view preprocessed data from select qdap objects.
These methods allow the user to grab particular parts of qdap objects in a consistent fashion. The majority of these methods also pick up a corresponding plot method as well. This adds to the qdap philosophy that data results should be easy to grab and easy to visualize. For instance:
(x <- question_type(DATA.SPLIT$state, DATA.SPLIT$person)) ## methods scores(x) plot(scores(x)) counts(x) plot(counts(x)) proportions(x) plot(proportions(x)) truncdf(preprocessed(x), 15) plot(preprocessed(x))
Demoing Some of the New Features
We’d like to take the time to highlight some of the development that has happened in qdap in the past several months:
Dispersion Plots
wrds <- freq_terms(pres_debates2012$dialogue, stopwords = Top200Words) ## Add leading/trailing spaces if desired wrds2 <- spaste(wrds) ## Use `~~` to maintain spaces wrds2 <- c(" governor~~romney ", wrds2[-c(3, 12)]) ## Plot with(pres_debates2012 , dispersion_plot(dialogue, wrds2, rm.vars = time, color="black", bg.color="white"))
with(rajSPLIT, dispersion_plot(dialogue, c("love", "night"), bg.color = "black", grouping.var = list(fam.aff, sex), color = "yellow", total.color = "white", horiz.color="grey20"))
Word Correlation
library(tm) data("crude") oil_cor1 <- apply_as_df(crude, word_cor, word = "oil", r=.7) plot(oil_cor1)
oil_cor2 <- apply_as_df(crude, word_cor, word = qcv(texas, oil, money), r=.7) plot(oil_cor2, ncol=2)
Easy Hash Table
A Small Example
lookup(1:5, data.frame(1:4, 11:14)) ## [1] 11 12 13 14 NA ## Leave alone elements w/o a match lookup(1:5, data.frame(1:4, 11:14), missing = NULL) ## [1] 11 12 13 14 5
Scaled Up 3 Million Records
key <- data.frame(x=1:2, y=c("A", "B")) ## x y ## 1 1 A ## 2 2 B big.vec <- sample(1:2, 3000000, T) out <- lookup(big.vec, key) out[1:20] ## On my system 3 million records in: ## Time difference of 24.5534 secs
Binary Operator Version
codes <- list(A=c(1, 2, 4), B = c(3, 5), C = 7, D = c(6, 8:10)) 1:12 %l% codes ## [1] "A" "A" "B" "A" "B" "D" "C" "D" "D" "D" NA NA 1:12 %l+% codes ## [1] "A" "A" "B" "A" "B" "D" "C" "D" "D" "D" "11" "12"
Simple-Quick Boolean Searches
We’ll be demoing this capability on the qdap data set DATA:
## person state ## 1 sam Computer is fun. Not too fun. ## 2 greg No it's not, it's dumb. ## 3 teacher What should we do? ## 4 sam You liar, it stinks! ## 5 greg I am telling the truth! ## 6 sally How can we be certain? ## 7 greg There is no way. ## 8 sam I distrust you. ## 9 sally What are you talking about? ## 10 researcher Shall we move on? Good then. ## 11 greg I'm hungry. Let's eat. You already?
First a brief explanation from the documentation:
terms – A character string(s) to search for. The terms are arranged in a single string with AND (use AND or && to connect terms together) and OR (use OR or || to allow for searches of either set of terms. Spaces may be used to control what is searched for. For example using ” I ” on c(“I’m”, “I want”, “in”) will result in FALSE TRUE FALSE whereas “I” will match all three (if case is ignored).
Let’s see how it works. We’ll start with ” I ORliar&&stinks”. This will find sentences that contain ” I “ or that contain “liar” and the word “stinks”.
boolean_search(DATA$state, " I ORliar&&stinks") ## The following elements meet the criteria: ## [1] 4 5 8 boolean_search(DATA$state, " I &&.", values=TRUE) ## The following elements meet the criteria: ## [1] "I distrust you." boolean_search(DATA$state, " I OR.", values=TRUE) ## The following elements meet the criteria: ## [1] "Computer is fun. Not too fun." ## [2] "No it's not, it's dumb." ## [3] "I am telling the truth!" ## [4] "There is no way." ## [5] "I distrust you." ## [6] "Shall we move on? Good then." ## [7] "I'm hungry. Let's eat. You already?" boolean_search(DATA$state, " I &&.") ## The following elements meet the criteria: ## [1] 8
Exclusion as Well
boolean_search(DATA$state, " I ||.", values=TRUE) ## The following elements meet the criteria: ## [1] "Computer is fun. Not too fun." ## [2] "No it's not, it's dumb." ## [3] "I am telling the truth!" ## [4] "There is no way." ## [5] "I distrust you." ## [6] "Shall we move on? Good then." ## [7] "I'm hungry. Let's eat. You already?" boolean_search(DATA$state, " I ||.", exclude = c("way", "truth"), values=TRUE) ## The following elements meet the criteria: ## [1] "Computer is fun. Not too fun." ## [2] "No it's not, it's dumb." ## [3] "I distrust you." ## [4] "Shall we move on? Good then." ## [5] "I'm hungry. Let's eat. You already?"
Binary Operator Version
dat <- data.frame(x = c("Doggy", "Hello", "Hi Dog", "Zebra"), y = 1:4) ## x y ## 1 Doggy 1 ## 2 Hello 2 ## 3 Hi Dog 3 ## 4 Zebra 4 z <- data.frame(z =c("Hello", "Dog")) ## z ## 1 Hello ## 2 Dog dat[dat$x %bs% paste(z$z, collapse = "OR"), ]
Polarity (Sentiment)
The polarity function is an extension of the work originally done by Jeffrey Breen with some accompnaying plotting methods. For more information see the Introduction to qdap Vignette.
poldat2 <- with(mraja1spl, polarity(dialogue, list(sex, fam.aff, died))) colsplit2df(scores(poldat2))[, 1:7] sex fam.aff died total.sentences total.words ave.polarity sd.polarity 1 f cap FALSE 158 1810 0.076422846 0.2620359 2 f cap TRUE 24 221 0.042477906 0.2087159 3 f mont TRUE 4 29 0.079056942 0.3979112 4 m cap FALSE 73 717 0.026496626 0.2558656 5 m cap TRUE 17 185 -0.159815603 0.3133931 6 m escal FALSE 9 195 -0.152764808 0.3131176 7 m escal TRUE 27 646 -0.069421082 0.2556493 8 m mont FALSE 70 952 -0.043809741 0.3837170 9 m mont TRUE 114 1273 -0.003653114 0.4090405 10 m none FALSE 7 78 0.062243180 0.1067989 11 none none FALSE 5 18 -0.281649658 0.4387579
The Accompanying Plotting Methods
plot(poldat2)
plot(scores(poldat2))
Question Type
dat <- c("Kate's got no appetite doesn't she?", "Wanna tell Daddy what you did today?", "You helped getting out a book?", "umm hum?", "Do you know what it is?", "What do you want?", "Who's there?", "Whose?", "Why do you want it?", "Want some?", "Where did it go?", "Was it fun?") left_just(preprocessed(question_type(dat))[, c(2, 6)]) raw.text q.type 1 Kate's got no appetite doesn't she? doesnt 2 Wanna tell Daddy what you did today? what 3 You helped getting out a book? implied_do/does/did 4 Umm hum? unknown 5 Do you know what it is? do 6 What do you want? what 7 Who's there? who 8 Whose? whose 9 Why do you want it? why 10 Want some? unknown 11 Where did it go? where 12 Was it fun? was x <- question_type(DATA.SPLIT$state, DATA.SPLIT$person) scores(x) person tot.quest what how shall implied_do/does/did 1 greg 1 0 0 0 1(100%) 2 researcher 1 0 0 1(100%) 0 3 sally 2 1(50%) 1(50%) 0 0 4 teacher 1 1(100%) 0 0 0 5 sam 0 0 0 0 0 plot(scores(x), high="orange")
These are a few of the more recent developments in qdap. We would encourage readers to dig into the new vignettes and start using qdap for various Natural Language Processing tasks. If you have suggestions or find a bug you are welcome to:
For a complete list of changes see qdap’s NEWS.md
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.