Site icon R-bloggers

How to convert contingency tables to data frames with R

[This article was first published on Rronan » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I wanted to write contingency tables in HTML with hwrite(). I realized that the method hwrite() does not exist for the table objects. I could use as.data.frame(), but the table produced is non-intuitive. I did a search on R-bloggers and I quickly found the solution to my problem: the as.data.frame.matrix() function.

The contingency table

A contingency table is a display format used to analyse and record the relationship between two categorical variables. For example, we use two variables from the dataset ?state included in R. The two variables are \(x\) (state.division) and \(y\) (state.region).

state.division
state.region
nlevels(state.division)
nlevels(state.region)

These two variables have respectively \(r = 9\) et \(s = 4\) terms. The contingency table therefore contains \((r + 1) \times (s + 1) – 1 = 49\) informatives cells.

The contingency table will show the number of times each combination of state.division and state.region appears.

(MyTable <- table(state.division, state.region))
##                     state.region
## state.division       Northeast South North Central West
##   New England                6     0             0    0
##   Middle Atlantic            3     0             0    0
##   South Atlantic             0     8             0    0
##   East South Central         0     4             0    0
##   West South Central         0     4             0    0
##   East North Central         0     0             5    0
##   West North Central         0     0             7    0
##   Mountain                   0     0             0    8
##   Pacific                    0     0             0    5
as.data.frame()

The R contingency tables are of class table. They are not handled the same way that the objects of class data.frame. Some methods of data.frame are not available for table (e.g. hwrite()). Actually, converting contingency tables to data frames gives non-intuitive results.

as.data.frame(MyTable)

state.division state.region Freq
New England Northeast 6
Middle Atlantic Northeast 3
South Atlantic Northeast 0
East South Central Northeast 0
West South Central Northeast 0
East North Central Northeast 0
West North Central Northeast 0
Mountain Northeast 0
Pacific Northeast 0
New England South 0
Middle Atlantic South 0
South Atlantic South 8
East South Central South 4
West South Central South 4
East North Central South 0
West North Central South 0
Mountain South 0
Pacific South 0
New England North Central 0
Middle Atlantic North Central 0
South Atlantic North Central 0
East South Central North Central 0
West South Central North Central 0
East North Central North Central 5
West North Central North Central 7
Mountain North Central 0
Pacific North Central 0
New England West 0
Middle Atlantic West 0
South Atlantic West 0
East South Central West 0
West South Central West 0
East North Central West 0
West North Central West 0
Mountain West 8
Pacific West 5

Here, the same information is presented in a table of \(3 \times r \times s = 108\) cells. Each term of \(x\) [\(y\)] is written \(s\) [respectively \(r\)] times.

as.data.frame.matrix()

The convert a table to a data.frame keeping its original structure, you must use the as.data.frame.matrix() function. This is probably the only situation in which this obscure function would be used.

as.data.frame.matrix(MyTable)

Northeast South North Central West
New England 6 0 0 0
Middle Atlantic 3 0 0 0
South Atlantic 0 8 0 0
East South Central 0 4 0 0
West South Central 0 4 0 0
East North Central 0 0 5 0
West North Central 0 0 7 0
Mountain 0 0 0 8
Pacific 0 0 0 5
Finally…

If you are fussy, you might notice that the variable names do not appear in contingency tables written with hwrite(). This can cause problems if the terms do not have explicit names (e.g., a variable encoded \(1, 2, \ldots, r\)). In that case, remember to specify your variables by adding a caption to your table.

To leave a comment for the author, please follow the link and comment on their blog: Rronan » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.