Site icon R-bloggers

How to Get the Frequency Table of a Categorical Variable as a Data Frame in R

[This article was first published on The Chemical Statistician » R programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

One feature that I like about R is the ability to access and manipulate the outputs of many functions.  For example, you can extract the kernel density estimates from density() and scale them to ensure that the resulting density integrates to 1 over its support set.

I recently needed to get a frequency table of a categorical variable in R, and I wanted the output as a data table that I can access and manipulate.  This is a fairly simple and common task in statistics and data analysis, so I thought that there must be a function in Base R that can easily generate this.  Sadly, I could not find such a function.  In this post, I will explain why the seemingly obvious table() function does not work, and I will demonstrate how the count() function in the ‘plyr’ package can achieve this goal.

 

The Example Data Set – mtcars

Let’s use the mtcars data set that is built into R as an example.  The categorical variable that I want to explore is “gear” – this denotes the number of forward gears in the car – so let’ s view the first 6 observations of just the car model and the gear.  We can use the subset() function to restrict the data set to show just the row names and “gear”.

> head(subset(mtcars, select = 'gear'))
                     gear
Mazda RX4            4
Mazda RX4 Wag        4
Datsun 710           4
Hornet 4 Drive       3
Hornet Sportabout    3
Valiant              3

What are the possible values of “gear”?  Let’s use the factor() function to find out.

> factor(mtcars$gear)
 [1] 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4
Levels: 3 4 5

The cars in this data set have either 3, 4 or 5 forward gears.  How many cars are there for each number of forward gears?

 

Why the table() function does not work well

The table() function in Base R does give the counts of a categorical variable, but the output is not a data frame – it’s a table, and it’s not easily accessible like a data frame.

> w = table(mtcars$gear)
> w
3 4 5 
15 12 5 

> class(w)
[1] "table"

You can convert this to a data frame, but the result does not retain the variable name “gear” in the corresponding column name.

> t = as.data.frame(w)
> t
    Var1 Freq
1   3    15
2   4    12
3   5    5

You can correct this problem with the names() function.

> names(t)[1] = 'gear'
> t
    gear Freq
1   3    15
2   4    12
3   5    5

I finally have what I want, but that took several functions to accomplish.  Is there an easier way?

 

count() to the Rescue!  (With Complements to the “plyr” Package)

Thankfully, there is an easier way – it’s the count() function in the “plyr” package.  If you don’t already have the “plyr” package, install it first – run the command

 install.packages('plyr')

Then, call its library, and the count() function will be ready for use.

> library(plyr)
> count(mtcars, 'gear')
       gear      freq
1      3         15
2      4         12
3      5         5
> y = count(mtcars, 'gear')
> y
       gear      freq
1      3         15
2      4         12
3      5         5
> class(y)
[1] "data.frame"

As the class() function confirms, this output is indeed a data frame!


Filed under: Applied Statistics, Categorical Data Analysis, Data Analysis, Descriptive Statistics, R programming, Statistics, Tutorials Tagged: categorical variable, class(), count, data frame, factor, frequency table, install.packages(0, mtcars, names(), plyr, R, R programming, subset, table()

To leave a comment for the author, please follow the link and comment on their blog: The Chemical Statistician » R programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.