Site icon R-bloggers

Descriptive summary: Proportions of values in a vector #rstats

[This article was first published on R – Strenge Jacke!, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When describing a sample, researchers in my field often show proportions of specific characteristics as description. For instance, proportion of female persons, proportion of persons with higher or lower income etc. Since it happens often that I like to know these characteristics when exploring data, I decided to write a function, prop(), which is part of my sjstats-package – a package dedicated to summary-functions, mostly for fit- or association-measures of regression models or descriptive statistics.

prop() is designed following a similar fashion like most functions of my sjmisc-package: first, the data; then an user-defined number of logical comparisons that define the proportions. A single comparison argument as input returns a vector, multiple comparisons return a tibble (where the first column contains the comparison, and the second the related proportion).

An examle from the mtcars dataset:

library(sjstats)
data(mtcars)
# proportions of observations in mpg that are greater than 25
prop(mtcars, mpg > 25)
#> [1] 0.1875

prop(mtcars, mpg > 25, disp > 200, gear == 4)
#> # A tibble: 3 × 2
#>   condition   prop
#>       <chr>  <dbl>
#> 1    mpg>25 0.1875
#> 2  disp>200 0.5000
#> 3   gear==4 0.3750

The function also works on grouped data frames, and with labelled data. In the following example, we group a dataset on family carers by their gender and education, and then get the proportions of observations where care-receivers are at least moderately dependent and male persons. To get an impression of how the raw variables look like, we first compute simple frequency tables with frq().

library(sjmisc) # for frq()-function
data(efc)
frq(efc, e42dep)
#> # elder's dependency
#> 
#>  val                label frq raw.prc valid.prc cum.prc
#>    1          independent  66    7.27      7.33    7.33
#>    2   slightly dependent 225   24.78     24.97   32.30
#>    3 moderately dependent 306   33.70     33.96   66.26
#>    4   severely dependent 304   33.48     33.74  100.00
#>    5                   NA   7    0.77        NA      NA

frq(efc, e16sex)
#> # elder's gender
#> 
#>  val  label frq raw.prc valid.prc cum.prc
#>    1   male 296   32.60     32.85   32.85
#>    2 female 605   66.63     67.15  100.00
#>    3     NA   7    0.77        NA      NA

efc %>%
  select(e42dep, c161sex, c172code, e16sex) %>%
  group_by(c161sex, c172code) %>%
  prop(e42dep > 2, e16sex == 1)

#> # A tibble: 6 × 4
#>   `carer's gender`    `carer's level of education` `e42dep>2` `e16sex==1`
#>              <chr>                           <chr>      <dbl>       <dbl>
#> 1             Male          low level of education     0.6829      0.3659
#> 2             Male intermediate level of education     0.6590      0.3155
#> 3             Male         high level of education     0.7872      0.2766
#> 4           Female          low level of education     0.7101      0.4638
#> 5           Female intermediate level of education     0.5929      0.2832
#> 6           Female         high level of education     0.6881      0.2752

So, within the group of male family carers with low level of education, 68.29% of care-receivers are moderately or severely dependent, and 36.59% of care-receivers are male. Within female family carers with high level of education, 68.81% of care-receivers are at least moderately dependent and 27.52% are male.


Tagged: R, rstats

To leave a comment for the author, please follow the link and comment on their blog: R – Strenge Jacke!.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.