How to convert contingency tables to data frames with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I wanted to write contingency tables in HTML with hwrite(). I realized that the method hwrite() does not exist for the table objects. I could use as.data.frame(), but the table produced is non-intuitive. I did a search on R-bloggers and I quickly found the solution to my problem: the as.data.frame.matrix() function.
The contingency table
A contingency table is a display format used to analyse and record the relationship between two categorical variables. For example, we use two variables from the dataset ?state included in R. The two variables are \(x\) (state.division) and \(y\) (state.region).
state.division state.region nlevels(state.division) nlevels(state.region)
These two variables have respectively \(r = 9\) et \(s = 4\) terms. The contingency table therefore contains \((r + 1) \times (s + 1) – 1 = 49\) informatives cells.
The contingency table will show the number of times each combination of state.division and state.region appears.
(MyTable <- table(state.division, state.region)) ## state.region ## state.division Northeast South North Central West ## New England 6 0 0 0 ## Middle Atlantic 3 0 0 0 ## South Atlantic 0 8 0 0 ## East South Central 0 4 0 0 ## West South Central 0 4 0 0 ## East North Central 0 0 5 0 ## West North Central 0 0 7 0 ## Mountain 0 0 0 8 ## Pacific 0 0 0 5
as.data.frame()
The R contingency tables are of class table. They are not handled the same way that the objects of class data.frame. Some methods of data.frame are not available for table (e.g. hwrite()). Actually, converting contingency tables to data frames gives non-intuitive results.
as.data.frame(MyTable)
| state.division | state.region | Freq | 
| New England | Northeast | 6 | 
| Middle Atlantic | Northeast | 3 | 
| South Atlantic | Northeast | 0 | 
| East South Central | Northeast | 0 | 
| West South Central | Northeast | 0 | 
| East North Central | Northeast | 0 | 
| West North Central | Northeast | 0 | 
| Mountain | Northeast | 0 | 
| Pacific | Northeast | 0 | 
| New England | South | 0 | 
| Middle Atlantic | South | 0 | 
| South Atlantic | South | 8 | 
| East South Central | South | 4 | 
| West South Central | South | 4 | 
| East North Central | South | 0 | 
| West North Central | South | 0 | 
| Mountain | South | 0 | 
| Pacific | South | 0 | 
| New England | North Central | 0 | 
| Middle Atlantic | North Central | 0 | 
| South Atlantic | North Central | 0 | 
| East South Central | North Central | 0 | 
| West South Central | North Central | 0 | 
| East North Central | North Central | 5 | 
| West North Central | North Central | 7 | 
| Mountain | North Central | 0 | 
| Pacific | North Central | 0 | 
| New England | West | 0 | 
| Middle Atlantic | West | 0 | 
| South Atlantic | West | 0 | 
| East South Central | West | 0 | 
| West South Central | West | 0 | 
| East North Central | West | 0 | 
| West North Central | West | 0 | 
| Mountain | West | 8 | 
| Pacific | West | 5 | 
Here, the same information is presented in a table of \(3 \times r \times s = 108\) cells. Each term of \(x\) [\(y\)] is written \(s\) [respectively \(r\)] times.
as.data.frame.matrix()
The convert a table to a data.frame keeping its original structure, you must use the as.data.frame.matrix() function. This is probably the only situation in which this obscure function would be used.
as.data.frame.matrix(MyTable)
| Northeast | South | North Central | West | |
| New England | 6 | 0 | 0 | 0 | 
| Middle Atlantic | 3 | 0 | 0 | 0 | 
| South Atlantic | 0 | 8 | 0 | 0 | 
| East South Central | 0 | 4 | 0 | 0 | 
| West South Central | 0 | 4 | 0 | 0 | 
| East North Central | 0 | 0 | 5 | 0 | 
| West North Central | 0 | 0 | 7 | 0 | 
| Mountain | 0 | 0 | 0 | 8 | 
| Pacific | 0 | 0 | 0 | 5 | 
Finally…
If you are fussy, you might notice that the variable names do not appear in contingency tables written with hwrite(). This can cause problems if the terms do not have explicit names (e.g., a variable encoded \(1, 2, \ldots, r\)). In that case, remember to specify your variables by adding a caption to your table.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
