Get basic summary statistics for all the variables in a data frame
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have added a new function to my {brotools}
package, called describe()
, which takes a data frame as an argument, and returns another data frame with descriptive statistics. It is very much inspired by the {skmir}
package but also by assist::describe()
(click on the packages to be redirected to the respective Github repos) but I wanted to write my own for two reasons: first, as an exercice, and second I really only needed the function skim_to_wide()
from {skimr}
. So instead of installing a whole package for a single function, I decided to write my own (since I use {brotools}
daily).
Below you can see it in action:
library(dplyr) data(starwars) brotools::describe(starwars) ## # A tibble: 13 x 12 ## variable type mean sd mode min max q25 median q75 ## <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 birth_year Numeric 87.6 155. <NA> 8. 896. 35.0 52. 72.0 ## 2 height Numeric 174. 34.8 <NA> 66. 264. 167. 180. 191. ## 3 mass Numeric 97.3 169. <NA> 15. 1358. 55.6 79. 84.5 ## 4 eye_color Charac… NA NA blue NA NA NA NA NA ## 5 gender Charac… NA NA male NA NA NA NA NA ## 6 hair_color Charac… NA NA blond NA NA NA NA NA ## 7 homeworld Charac… NA NA Tatooine NA NA NA NA NA ## 8 name Charac… NA NA Luke Sky… NA NA NA NA NA ## 9 skin_color Charac… NA NA fair NA NA NA NA NA ## 10 species Charac… NA NA Human NA NA NA NA NA ## 11 films List NA NA <NA> NA NA NA NA NA ## 12 starships List NA NA <NA> NA NA NA NA NA ## 13 vehicles List NA NA <NA> NA NA NA NA NA ## # ... with 2 more variables: n_missing <int>, n_unique <int>
As you can see, the object that is returned by describe()
is a tibble
.
For now, this function does not handle dates, but it’s in the pipeline.
You can also only describe certain columns:
brotools::describe(starwars, height, mass, name) ## # A tibble: 3 x 12 ## variable type mean sd mode min max q25 median q75 ## <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 height Numeric 174. 34.8 <NA> 66. 264. 167. 180. 191. ## 2 mass Numeric 97.3 169. <NA> 15. 1358. 55.6 79. 84.5 ## 3 name Charact… NA NA Luke Skywa… NA NA NA NA NA ## # ... with 2 more variables: n_missing <int>, n_unique <int>
If you want to try it out, you can install {brotools}
from Github:
devtools::install_github("b-rodrigues/brotools")
If you found this blog post useful, you might want to follow me on twitter for blog post updates.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.