Site icon R-bloggers

Example 9.3: augmented display of contingency table

[This article was first published on SAS and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

SAS and R often provide different levels of details from output. This is particularly true for the descriptive analysis of contingency tables, where SAS makes it easy to display tables with additional quantities (such as the observed cell count).

The mosaic package has added functionality to calculate these quantities in R. We demonstrate using an example from the HELP dataset.

R
ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
library(mosaic)
ds$gender = ifelse(ds$female==1, "female", "male")
ds$homeless = ifelse(ds$homeless==1, "homeless", "housed")
tab = xtabs(~ gender + homeless, data=ds)
> tab
        homeless
gender   homeless housed
  female       40     67
  male        169    177
> xchisq.test(tab)

 Pearson's Chi-squared test with Yates' continuity correction

data:  tab 
X-squared = 3.8708, df = 1, p-value = 0.04913

  40.00    67.00 
( 49.37) ( 57.63)
 [1.78]   [1.52] 
< -1.33>  <  1.23> 
   
 169.00   177.00 
(159.63) (186.37)
 [0.55]   [0.47] 
<  0.74>  < -0.69> 
   
key:
 observed
 (expected)
 [contribution to X-squared]
 

We see that there is a borderline statistically significant association between gender and homeless status in the HELP study. We interpret that we see fewer than expected females who are homeless, and more males who are homeless.

Another idea is to use graphical depictions of the association in this table. One approach is a mosaic plot (note: no relation to Project MOSAIC and the mosaic package). A mosaic plot starts as a square with area equal to one. It is divided into columns based on the prevalence in each of the values for the column variable (in this case, gender). Then each bar is divided vertically based on the conditional probability of the other variable within that category.

Another graphical display of a table is the association plot. In an association plot, there is also a box for each cell of the table. The area of the box is proportional to the difference between the observed and expected (assuming no association) frequencies. In a typical presentation, excess observed counts are black and above the line, while deficient counts are red and below the line.

Above, we show the mosaic plot (on the left) and association plot (on the right). Both of these displays demonstrate that there is an association. The mosaic plot indicates that only about a quarter of the sample is female (indicated by the width of the columns), and that homelessness is present in about half the subjects (area shaded in light grey). The slight association demonstrated is that there are fewer homeless women than expected (since the horizontal line moves down between the first and second column). Similarly, for the association plot we note that the expected cell count is less than the observed (indicated in red with values below the line) for the female homeless group.
par(mfrow=c(1,2))
mosaicplot(tab, color=TRUE, main="mosaic plot")
assocplot(tab)
title("association plot")


SAS
As in Example 8.32, we find SAS macros for mosaic plots among the contributions of Michael Friendly. In this complex case, they are somewhat more difficult to access than others. The code for the plots themselves can be downloaded here, while it’s useful to also run a wrapper macro. After downloading the files, the following code can be used to make the figure below.
title 'Install mosaic modules';
* location of the zipped files;
filename mosaic  'c:\ken\sasmacros\mosaics';
* storage location of compiled macros;
libname  mosaic   'c:\ken\sasmacros\mosaics';

* Code to read in, compile and store the macros;
proc iml ;
   reset storage=mosaic.mosaic;
   %include mosaic(mosaics) ;
   store module=_all_;
   show storage;
quit;

* Prep: create the table, save the cell counts;
proc freq data = "c:\book\help.sas7bdat";
tables homeless * female / out=outhelp;
run;

* Read in the wrapper macro;
%include "c:\ken\sasmacros\mosaics\mosaic.sas";

* Make the plot;
%mosaic(data=outhelp,var = female homeless, 
        sort=homeless descending female, space = 1 1);

The sort and space options make the results more similar to those shown for mosaicplot(). In this version, the colors reflect the signs of the residuals.

To leave a comment for the author, please follow the link and comment on their blog: SAS and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.