X is for By
[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Let’s pull up the Facebook dataset for this.
Facebook<-read.delim(file="full_facebook_set.txt", header=TRUE)
This is the full dataset, which includes all the variables I collected. I don't want to run analyses on all variables, so I'll pull out the ones most important for this blog post demonstration.
smallFB<-Facebook[,c(1:2,77:80,105:116,122,133:137,170,187)]
First, I'll run descriptives on this smaller data frame by gender.
library(psych) ## Warning: package 'psych' was built under R version 3.4.4 describeBy(smallFB,smallFB$gender) ## ## Descriptive statistics by group ## group: 0 ## vars n mean sd median trimmed mad min ## RespondentId 1 73 164647.77 1711.78 164943.0 164587.37 2644.96 162373.0 ## gender 2 73 0.00 0.00 0.0 0.00 0.00 0.0 ## Rumination 3 73 37.66 14.27 37.0 37.41 13.34 8.0 ## DepRelat 4 73 21.00 7.86 21.0 20.95 5.93 4.0 ## Brood 5 73 8.49 3.76 9.0 8.42 2.97 1.0 ## Reflect 6 73 8.16 4.44 8.0 8.24 4.45 0.0 ## SavorPos 7 73 64.30 10.93 65.0 64.92 8.90 27.0 ## SavorNeg 8 73 33.30 11.48 33.0 33.08 13.34 12.0 ## SavorTot 9 73 31.00 20.15 34.0 31.15 19.27 -10.0 ## AntPos 10 73 20.85 3.95 21.0 20.93 4.45 10.0 ## AntNeg 11 73 11.30 4.23 11.0 11.22 4.45 4.0 ## AntTot 12 73 9.55 6.90 10.0 9.31 7.41 -3.0 ## MomPos 13 73 21.68 3.95 22.0 21.90 2.97 9.0 ## MomNeg 14 73 11.45 4.63 11.0 11.41 5.93 4.0 ## MomTot 15 73 10.23 7.63 11.0 10.36 8.90 -11.0 ## RemPos 16 73 21.77 4.53 23.0 22.20 4.45 8.0 ## RemNeg 17 73 10.55 4.39 9.0 10.27 4.45 4.0 ## RemTot 18 73 11.22 8.05 14.0 11.68 7.41 -8.0 ## LifeSat 19 73 24.63 6.80 25.0 24.93 7.41 10.0 ## Extravert 20 73 4.32 1.58 4.5 4.33 1.48 1.5 ## Agreeable 21 73 4.79 1.08 5.0 4.85 1.48 1.0 ## Conscient 22 73 5.14 1.34 5.0 5.19 1.48 2.0 ## EmotStab 23 73 5.10 1.22 5.0 5.15 1.48 1.0 ## OpenExp 24 73 5.11 1.29 5.5 5.20 1.48 2.0 ## Health 25 73 28.77 19.56 25.0 26.42 17.79 0.0 ## Depression 26 73 10.26 7.27 9.0 9.56 5.93 0.0 ## max range skew kurtosis se ## RespondentId 168279 5906.0 0.21 -1.36 200.35 ## gender 0 0.0 NaN NaN 0.00 ## Rumination 71 63.0 0.12 -0.53 1.67 ## DepRelat 42 38.0 0.10 -0.04 0.92 ## Brood 17 16.0 0.15 -0.38 0.44 ## Reflect 19 19.0 -0.12 -0.69 0.52 ## SavorPos 84 57.0 -0.69 0.76 1.28 ## SavorNeg 57 45.0 0.14 -0.95 1.34 ## SavorTot 72 82.0 -0.17 -0.75 2.36 ## AntPos 28 18.0 -0.24 -0.46 0.46 ## AntNeg 22 18.0 0.27 -0.55 0.49 ## AntTot 24 27.0 0.11 -0.76 0.81 ## MomPos 28 19.0 -0.69 0.55 0.46 ## MomNeg 22 18.0 0.08 -0.98 0.54 ## MomTot 24 35.0 -0.25 -0.55 0.89 ## RemPos 28 20.0 -0.88 0.35 0.53 ## RemNeg 22 18.0 0.56 -0.66 0.51 ## RemTot 24 32.0 -0.53 -0.77 0.94 ## LifeSat 35 25.0 -0.37 -0.84 0.80 ## Extravert 7 5.5 -0.09 -0.93 0.19 ## Agreeable 7 6.0 -0.60 1.04 0.13 ## Conscient 7 5.0 -0.24 -0.98 0.16 ## EmotStab 7 6.0 -0.60 0.28 0.14 ## OpenExp 7 5.0 -0.49 -0.55 0.15 ## Health 91 91.0 1.13 1.14 2.29 ## Depression 36 36.0 1.02 0.95 0.85 ## -------------------------------------------------------- ## group: 1 ## vars n mean sd median trimmed mad ## RespondentId 1 184 164373.49 1515.34 164388.00 164253.72 1891.80 ## gender 2 184 1.00 0.00 1.00 1.00 0.00 ## Rumination 3 184 38.09 15.28 40.00 38.16 17.05 ## DepRelat 4 184 21.67 8.78 21.00 21.66 8.90 ## Brood 5 184 8.57 4.14 8.50 8.47 3.71 ## Reflect 6 184 7.84 4.06 8.00 7.73 4.45 ## SavorPos 7 184 67.22 9.63 68.00 67.71 8.90 ## SavorNeg 8 184 29.75 11.62 27.50 28.72 9.64 ## SavorTot 9 184 37.47 19.30 40.00 38.66 20.02 ## AntPos 10 184 22.18 3.37 23.00 22.28 2.97 ## AntNeg 11 184 10.10 4.44 9.00 9.78 4.45 ## AntTot 12 184 12.08 6.85 14.00 12.36 5.93 ## MomPos 13 184 22.28 3.88 23.00 22.59 2.97 ## MomNeg 14 184 10.60 4.88 9.50 10.13 5.19 ## MomTot 15 184 11.68 7.75 13.00 12.29 7.41 ## RemPos 16 184 22.76 3.85 23.00 23.10 2.97 ## RemNeg 17 184 9.05 3.79 8.00 8.68 2.97 ## RemTot 18 184 13.71 6.97 15.00 14.34 5.93 ## LifeSat 19 184 23.76 6.25 24.00 24.18 7.41 ## Extravert 20 184 4.66 1.57 5.00 4.74 1.48 ## Agreeable 21 184 5.22 1.06 5.50 5.26 1.48 ## Conscient 22 184 5.32 1.24 5.50 5.42 1.48 ## EmotStab 23 184 4.70 1.31 4.75 4.75 1.11 ## OpenExp 24 184 5.47 1.08 5.50 5.56 0.74 ## Health 25 184 32.54 16.17 30.00 31.43 16.31 ## Depression 26 184 12.19 8.48 9.00 11.09 5.93 ## min max range skew kurtosis se ## RespondentId 162350.0 167714 5364.0 0.46 -0.90 111.71 ## gender 1.0 1 0.0 NaN NaN 0.00 ## Rumination 3.0 74 71.0 -0.05 -0.60 1.13 ## DepRelat 0.0 42 42.0 0.00 -0.46 0.65 ## Brood 0.0 19 19.0 0.19 -0.62 0.31 ## Reflect 0.0 19 19.0 0.25 -0.48 0.30 ## SavorPos 33.0 84 51.0 -0.59 0.36 0.71 ## SavorNeg 12.0 64 52.0 0.79 0.25 0.86 ## SavorTot -18.0 72 90.0 -0.57 -0.10 1.42 ## AntPos 9.0 28 19.0 -0.49 0.41 0.25 ## AntNeg 4.0 22 18.0 0.63 -0.39 0.33 ## AntTot -8.0 24 32.0 -0.43 -0.48 0.50 ## MomPos 10.0 28 18.0 -0.81 0.54 0.29 ## MomNeg 4.0 24 20.0 0.81 -0.03 0.36 ## MomTot -13.0 24 37.0 -0.69 -0.03 0.57 ## RemPos 9.0 28 19.0 -0.87 0.81 0.28 ## RemNeg 4.0 21 17.0 0.83 0.33 0.28 ## RemTot -9.0 24 33.0 -0.82 0.50 0.51 ## LifeSat 8.0 35 27.0 -0.53 -0.32 0.46 ## Extravert 1.0 7 6.0 -0.36 -0.72 0.12 ## Agreeable 2.5 7 4.5 -0.27 -0.63 0.08 ## Conscient 1.0 7 6.0 -0.70 0.13 0.09 ## EmotStab 1.5 7 5.5 -0.35 -0.73 0.10 ## OpenExp 1.5 7 5.5 -0.91 0.62 0.08 ## Health 2.0 85 83.0 0.60 -0.05 1.19 ## Depression 0.0 39 39.0 1.14 0.66 0.62
In this dataset, I coded men as 0 and women as 1. The descriptive statistics table generated includes all scale and subscale scores, and gives me mean, standard deviation, median, a trimmed mean (dropping very low and very high values), median absolute deviation, minimum and maximum values, range, skewness, and kurtosis. I'd need to run t-tests to find out if differences were significant, but this still gives me some idea of how men and women might differ on these measures.
There are certain measures I included that we might hypothesize would show gender differences. For instance, some research suggests gender differences for rumination and depression. In addition to running descriptives by group, I might also want to display these differences in a violin plot. The psych package can quickly generate such a plot by group.
violinBy(smallFB,"Rumination","gender",grp.name=c("M","F"))
violinBy(smallFB,"Depression","gender",grp.name=c("M","F"))
ggplot2 will generate a violin plot by group, so this feature might not be as useful for final displays, but could help in quickly visualizing the data during analysis. And you may find that you prefer the appearance of this plots. To each his own.
Another function is error.bars.by, which plots means and confidence intervals by group for multiple variables. Again, this is a way to get some quick visuals, though differences in scale among measures should be taken into consideration when generating this plot. One set of variables for which this display might be useful is the 5 subscales of the Five-Factor Personality Inventory. This 10-item measure assesses where participants fall on the so-called Big Five personality traits: Openness to Experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (Emotional Stability). These subscales are all on the same metric.
error.bars.by(smallFB[,c(20:24)],group=smallFB$gender,xlab="Big Five Personality Traits",ylab="Score on Subscale")