Example 8.42: skewness and kurtosis and more moments (oh my!)
[This article was first published on SAS and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
While skewness and kurtosis are not as often calculated and reported as mean and standard deviation, they can be useful at times. Skewness is the 3rd moment around the mean, and characterizes whether the distribution is symmetric (skewness=0). Kurtosis is a function of the 4th central moment, and characterizes peakedness, where the normal distribution has a value of 3 and smaller values correspond to thinner tails (less peakedness).
Some packages (including SAS) subtract three from the kurtosis, so that the normal distribution has a kurtosis of 0 (this is sometimes called excess kurtosis.
R
library(moments) library(lattice) ds = read.csv("http://www.math.smith.edu/r/data/help.csv") ds$gender = ifelse(ds$female==1, "female", "male") densityplot(~ cesd, data=ds, groups=gender, auto.key=TRUE)
We see that the distribution of CESD scores is skewed with a long left tail, and appears somewhat less peaked than a normal distribution. This is confirmed by the actual statistics:
> with(ds, tapply(cesd, gender, skewness)) female male -0.4906171 -0.2464390 > with(ds, tapply(cesd, gender, kurtosis)) # kurtosis female male 2.748968 2.547061 > with(ds, tapply(cesd, gender, kurtosis))-3 # excess kurtosis female male -0.2510318 -0.4529394
SAS
SAS includes much detail on the moments and other statistics in the output from proc univariate. As usual, the quantity of output can be off-putting for new users and students. Here we extract the moments we need with the ODS system. We also generate kernel density estimates roughly analogous to the densityplot() results shown above.
ods output moments = cesdmoments; proc univariate data="c:\book\help.sas7bdat"; class female; var cesd; histogram cesd / kernel; run; proc print data=cesdmoments; where label1 = "Skewness"; var female label1 nvalue1 label2 nvalue2; run;
With the result:
Obs FEMALE Label1 nValue1 Label2 nValue2 4 0 Skewness -0.247513 Kurtosis -0.442010 10 1 Skewness -0.497620 Kurtosis -0.204928
We note that the default is to produce unbiased (REML) estimates, rather than the biased method of moments estimator produced by the kurtosis() function (and that SAS presents the excess kurtosis).
To leave a comment for the author, please follow the link and comment on their blog: SAS and R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.