Descriptive summary: Proportions of values in a vector #rstats
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When describing a sample, researchers in my field often show proportions of specific characteristics as description. For instance, proportion of female persons, proportion of persons with higher or lower income etc. Since it happens often that I like to know these characteristics when exploring data, I decided to write a function, prop()
, which is part of my sjstats-package – a package dedicated to summary-functions, mostly for fit- or association-measures of regression models or descriptive statistics.
prop()
is designed following a similar fashion like most functions of my sjmisc-package: first, the data; then an user-defined number of logical comparisons that define the proportions. A single comparison argument as input returns a vector, multiple comparisons return a tibble (where the first column contains the comparison, and the second the related proportion).
An examle from the mtcars dataset:
library(sjstats) data(mtcars) # proportions of observations in mpg that are greater than 25 prop(mtcars, mpg > 25) #> [1] 0.1875 prop(mtcars, mpg > 25, disp > 200, gear == 4) #> # A tibble: 3 × 2 #> condition prop #> <chr> <dbl> #> 1 mpg>25 0.1875 #> 2 disp>200 0.5000 #> 3 gear==4 0.3750
The function also works on grouped data frames, and with labelled data. In the following example, we group a dataset on family carers by their gender and education, and then get the proportions of observations where care-receivers are at least moderately dependent and male persons. To get an impression of how the raw variables look like, we first compute simple frequency tables with frq()
.
library(sjmisc) # for frq()-function data(efc) frq(efc, e42dep) #> # elder's dependency #> #> val label frq raw.prc valid.prc cum.prc #> 1 independent 66 7.27 7.33 7.33 #> 2 slightly dependent 225 24.78 24.97 32.30 #> 3 moderately dependent 306 33.70 33.96 66.26 #> 4 severely dependent 304 33.48 33.74 100.00 #> 5 NA 7 0.77 NA NA frq(efc, e16sex) #> # elder's gender #> #> val label frq raw.prc valid.prc cum.prc #> 1 male 296 32.60 32.85 32.85 #> 2 female 605 66.63 67.15 100.00 #> 3 NA 7 0.77 NA NA efc %>% select(e42dep, c161sex, c172code, e16sex) %>% group_by(c161sex, c172code) %>% prop(e42dep > 2, e16sex == 1) #> # A tibble: 6 × 4 #> `carer's gender` `carer's level of education` `e42dep>2` `e16sex==1` #> <chr> <chr> <dbl> <dbl> #> 1 Male low level of education 0.6829 0.3659 #> 2 Male intermediate level of education 0.6590 0.3155 #> 3 Male high level of education 0.7872 0.2766 #> 4 Female low level of education 0.7101 0.4638 #> 5 Female intermediate level of education 0.5929 0.2832 #> 6 Female high level of education 0.6881 0.2752
So, within the group of male family carers with low level of education, 68.29% of care-receivers are moderately or severely dependent, and 36.59% of care-receivers are male. Within female family carers with high level of education, 68.81% of care-receivers are at least moderately dependent and 27.52% are male.
Tagged: R, rstats
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.