Easy R: Summary statistics grouping by a categorical variable
[This article was first published on R code – data technik, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Once I found this great R package that really improves on the dplyr summary() function it was a game changer.
This library allows for the best summary statistics for each variable grouped by a categorical variable. It can also be saved as a list with an assignment.
library(purrr) credit %>% split(credit$Date) %>% map(summary)
Simply use datatable$column that is the categorical variable then use the map function to run summary. And that’s it! All set to produce results like these:
$Aug
Homeowner Credit.Score Years.of.Credit.History
Min. :0.0000 Min. :485.0 Min. : 2.00
1st Qu.:0.0000 1st Qu.:545.5 1st Qu.: 5.50
Median :0.0000 Median :591.0 Median : 9.00
Mean :0.3704 Mean :601.6 Mean :10.33
3rd Qu.:1.0000 3rd Qu.:630.0 3rd Qu.:14.50
Max. :1.0000 Max. :811.0 Max. :22.00
Revolving.Balance Revolving.Utilization Approval Loan.Amount
$2,000 : 2 100% : 3 Min. :0.0000 $11,855 : 1
$27,000 : 2 65% : 2 1st Qu.:0.0000 $12,150 : 1
$29,100 : 2 70% : 2 Median :0.0000 $13,054 : 1
$1,000 : 1 78% : 2 Mean :0.1481 $15,451 : 1
$10,500 : 1 79% : 2 3rd Qu.:0.0000 $16,218 : 1
$12,050 : 1 85% : 2 Max. :1.0000 $17,189 : 1
(Other) :18 (Other):14 (Other) :21
Date Default
Aug :27 0:14
July: 0 1:13
$July
Homeowner Credit.Score Years.of.Credit.History
Min. :0.0000 Min. :620.0 Min. : 2.0
1st Qu.:0.5000 1st Qu.:682.5 1st Qu.: 8.0
Median :1.0000 Median :701.0 Median :12.0
Mean :0.7391 Mean :711.8 Mean :12.3
3rd Qu.:1.0000 3rd Qu.:746.5 3rd Qu.:16.5
Max. :1.0000 Max. :802.0 Max. :24.0
Revolving.Balance Revolving.Utilization Approval Loan.Amount
$11,200 : 2 11% : 2 Min. :0.0000 $3,614 : 2
$11,700 : 2 15% : 2 1st Qu.:1.0000 $12,303 : 1
$6,100 : 2 20% : 2 Median :1.0000 $12,338 : 1
$10,000 : 1 5% : 2 Mean :0.8261 $12,712 : 1
$10,500 : 1 7% : 2 3rd Qu.:1.0000 $13,020 : 1
$11,320 : 1 70% : 2 Max. :1.0000 $17,697 : 1
(Other) :14 (Other):11 (Other) :16
Date Default
Aug : 0 0:10
July:23 1:13
You’ll have to do some formatting, or export to excel ! So fast and easy with this one.
To leave a comment for the author, please follow the link and comment on their blog: R code – data technik.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.