Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
by Joseph Rickert
One of the most difficult things about R, a problem that is particularly vexing to beginners, is finding things. This is an unintended consequence of R's spectacular, but mostly uncoordinated, organic growth. The R core team does a superb job of maintaining the stability and growth of the R language itself, but the innovation engine for new functionality is largely in the hands of the global R communty.
Several structures have been put in place to address various apsects of the finding things problem. For example, Task Views represent a monumental effort to collect and classify R packages. The RSeek site is an effective tool for web searches. RBloggers is a good place to go for R applications and CRANberries let's you know what's new. But, how do you find things that you didn't even know you were looking for?For this, the so called "misc packages" can be very helpful. Whereas the majority of R packages are focused on a particular type of analysis or class of models, or special tool, misc packages tend to be collections of functions that facilitate common tasks. (Look below for a partial list).
DescTools is a new entry to the misc package scene that I think could become very popular. The description for the package begins:
DescTools contains a bunch of basic statistic functions and convenience wrappers for efficiently describing data, creating specific plots, doing reports using MS Word, Excel or PowerPoint. The package's intention is to offer a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R.
So far, of the 380 functions in this collection the Desc function has my attention. This function provides very nice tabular and graphic summaries of the variables in a data frame with output that is specific to the data type. The d.pizza data frame that comes with the package has a nice mix of data types
head(d.pizza) index date week weekday area count rabate price operator driver delivery_min temperature wine_ordered wine_delivered 1 1 2014-03-01 9 6 Camden 5 TRUE 65.655 Rhonda Taylor 20.0 53.0 0 0 2 2 2014-03-01 9 6 Westminster 2 FALSE 26.980 Rhonda Butcher 19.6 56.4 0 0 3 3 2014-03-01 9 6 Westminster 3 FALSE 40.970 Allanah Butcher 17.8 36.5 0 0 4 4 2014-03-01 9 6 Brent 2 FALSE 25.980 Allanah Taylor 37.3 NA 0 0 5 5 2014-03-01 9 6 Brent 5 TRUE 57.555 Rhonda Carter 21.8 50.0 0 0 6 6 2014-03-01 9 6 Camden 1 FALSE 13.990 Allanah Taylor 48.7 27.0 0 0 wrongpizza quality 1 FALSE medium 2 FALSE high 3 FALSE <NA> 4 FALSE <NA> 5 FALSE medium 6 FALSE low
Here is some of the voluminous output from the function. The data frame as a whole is summarized as follows
'data.frame': 1209 obs. of 16 variables: 1 $ index : int 1 2 3 4 5 6 7 8 9 10 ... 2 $ date : Date, format: "2014-03-01" "2014-03-01" "2014-03-01" "2014-03-01" ... 3 $ week : num 9 9 9 9 9 9 9 9 9 9 ... 4 $ weekday : num 6 6 6 6 6 6 6 6 6 6 ... 5 $ area : Factor w/ 3 levels "Brent","Camden",..: 2 3 3 1 1 2 2 1 3 1 ... 6 $ count : int 5 2 3 2 5 1 4 NA 3 6 ... 7 $ rabate : logi TRUE FALSE FALSE FALSE TRUE FALSE ... 8 $ price : num 65.7 27 41 26 57.6 ... 9 $ operator : Factor w/ 3 levels "Allanah","Maria",..: 3 3 1 1 3 1 3 1 1 3 ... 10 $ driver : Factor w/ 7 levels "Butcher","Carpenter",..: 7 1 1 7 3 7 7 7 7 3 ... 11 $ delivery_min : num 20 19.6 17.8 37.3 21.8 48.7 49.3 25.6 26.4 24.3 ... 12 $ temperature : num 53 56.4 36.5 NA 50 27 33.9 54.8 48 54.4 ... 13 $ wine_ordered : int 0 0 0 0 0 0 1 NA 0 1 ... 14 $ wine_delivered: int 0 0 0 0 0 0 1 NA 0 1 ... 15 $ wrongpizza : logi FALSE FALSE FALSE FALSE FALSE FALSE ... 16 $ quality : Ord.factor w/ 3 levels "low"<"medium"<..: 2 3 NA NA 2 1 1 3 3 2 ...
The factor variable driver gets a table and a plot.
10 - driver (factor) length n NAs levels unique dupes 1'209 1'204 5 7 7 y level freq perc cumfreq cumperc 1 Carpenter 272 .226 272 .226 2 Carter 234 .194 506 .420 3 Taylor 204 .169 710 .590 4 Hunter 156 .130 866 .719 5 Miller 125 .104 991 .823 6 Farmer 117 .097 1108 .920 7 Butcher 96 .080 1204 1.000
and so does the numeric variable delivery.
11 - delivery_min (numeric) length n NAs unique 0s mean meanSE 1'209 1'209 0 384 0 25.653 0.312 .05 .10 .25 median .75 .90 .95 10.400 11.600 17.400 24.400 32.500 40.420 45.200 rng sd vcoef mad IQR skew kurt 56.800 10.843 0.423 11.268 15.100 0.611 0.095 lowest : 8.8 (3), 8.9, 9 (3), 9.1 (5), 9.2 (3) highest: 61.9, 62.7, 62.9, 63.2, 65.6 Shapiro-Wilks normality test p.value : 2.2725e-16
Pretty nice for an automatic first look at the data.
For some more R treasure hunting have a look into the following short list of misc packages.
Package |
Description |
Tools for manipulating data (No 1 package downloaded for 2013) |
|
Convenience wrappers for functions for manipulating strings |
|
One of the most popular R packages of all time: functions for data analysis, graphics, utilities and much more |
|
Package development tools |
|
The “go to” package for machine learning, classification and regression training |
|
Good svm implementation and other machine learning algorithms |
|
Tools for describing data and descriptive statistics |
|
Tools for plotting decision trees |
|
Functions for numerical analysis, linear algebra, optimization, differential equations and some special functions |
|
Contains different high-level graphics functions for displaying large datasets |
|
Relatively new package with various functions for survival data extending the methods available in the survival package. |
|
New this year: miscellaneous R tools to simplify the working with data types and formats including functions for working with data frames and character strings |
|
Some functions for Kalman filters |
|
Misc 3d plots including isosurfaces |
|
New package with utilities for producing maps |
|
Various programming tools like ASCIIfy() to convert characters to ASCII and checkRVersion() to see if a newer version of R is available |
|
A grab bag of utilities including progress bars and function timers |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.