Mining for relations between nominal variables
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The task today was to find what variables had significant relations with an important grouping variable in the big dataset I’ve been working with lately. The grouping variable has 3 levels, and represents different behaviours of interest. At first I tried putting the grouping variable as a dependent variable in a multinomial logistic regression, but I didn’t really trust the output, and the goal was really just to construct a bunch of graphs showing significant bivariate nominal relations in the data..
That’s when I turned to my good old friend, the chi squared test. All I had to do was select all the variables that I wanted to test against the grouping variable, and construct a list of the chi squared statistic from each test, the variable being tested, and the crosstab of the two variables for later graphing. So that’s exactly what I did:
One really sweet thing about matrices in R is that you can mix them up with some parts having just numbers, some parts having text, and sub-matrices in other parts! A typical row of the “resultlist” would look something like this:
xsq testvar xtab
[1,] 200.7 “variable1″ numeric,6
Then all I needed to do to see the variable name and crosstab for that variable was to call “resultlist[1,2:3]“, and that gave me the numbers to graph.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.