Site icon R-bloggers

Easy R: Summary statistics grouping by a categorical variable

[This article was first published on R code – data technik, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Once I found this great R package that really improves on the dplyr summary() function it was a game changer.

This library allows for the best summary statistics for each variable grouped by a categorical variable. It can also be saved as a list with an assignment.

library(purrr)
credit %>% split(credit$Date) %>% map(summary)

Simply use datatable$column that is the categorical variable then use the map function to run summary. And that’s it! All set to produce results like these:

$Aug
   Homeowner       Credit.Score   Years.of.Credit.History
 Min.   :0.0000   Min.   :485.0   Min.   : 2.00          
 1st Qu.:0.0000   1st Qu.:545.5   1st Qu.: 5.50          
 Median :0.0000   Median :591.0   Median : 9.00          
 Mean   :0.3704   Mean   :601.6   Mean   :10.33          
 3rd Qu.:1.0000   3rd Qu.:630.0   3rd Qu.:14.50          
 Max.   :1.0000   Max.   :811.0   Max.   :22.00          
                                                         
 Revolving.Balance Revolving.Utilization    Approval        Loan.Amount
 $2,000  : 2       100%   : 3            Min.   :0.0000   $11,855 : 1  
 $27,000 : 2       65%    : 2            1st Qu.:0.0000   $12,150 : 1  
 $29,100 : 2       70%    : 2            Median :0.0000   $13,054 : 1  
 $1,000  : 1       78%    : 2            Mean   :0.1481   $15,451 : 1  
 $10,500 : 1       79%    : 2            3rd Qu.:0.0000   $16,218 : 1  
 $12,050 : 1       85%    : 2            Max.   :1.0000   $17,189 : 1  
 (Other) :18       (Other):14                             (Other) :21  
   Date    Default
 Aug :27   0:14   
 July: 0   1:13   
                  
                                         
$July
   Homeowner       Credit.Score   Years.of.Credit.History
 Min.   :0.0000   Min.   :620.0   Min.   : 2.0           
 1st Qu.:0.5000   1st Qu.:682.5   1st Qu.: 8.0           
 Median :1.0000   Median :701.0   Median :12.0           
 Mean   :0.7391   Mean   :711.8   Mean   :12.3           
 3rd Qu.:1.0000   3rd Qu.:746.5   3rd Qu.:16.5           
 Max.   :1.0000   Max.   :802.0   Max.   :24.0           
                                                         
 Revolving.Balance Revolving.Utilization    Approval        Loan.Amount
 $11,200 : 2       11%    : 2            Min.   :0.0000   $3,614  : 2  
 $11,700 : 2       15%    : 2            1st Qu.:1.0000   $12,303 : 1  
 $6,100  : 2       20%    : 2            Median :1.0000   $12,338 : 1  
 $10,000 : 1       5%     : 2            Mean   :0.8261   $12,712 : 1  
 $10,500 : 1       7%     : 2            3rd Qu.:1.0000   $13,020 : 1  
 $11,320 : 1       70%    : 2            Max.   :1.0000   $17,697 : 1  
 (Other) :14       (Other):11                             (Other) :16  
   Date    Default
 Aug : 0   0:10   
 July:23   1:13   

You’ll have to do some formatting, or export to excel ! So fast and easy with this one.

To leave a comment for the author, please follow the link and comment on their blog: R code – data technik.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.