Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Tired of trying to get your data to print right or formatting it in a program like excel? Try out fashion()
from the corrr
package:
d <- data.frame( gender = factor(c("Male", "Female", NA)), age = c(NA, 28.1111111, 74.3), height = c(188, NA, 168.78906), fte = c(NA, .78273, .9) ) d #> gender age height fte #> 1 Male NA 188.0000 NA #> 2 Female 28.11111 NA 0.78273 #> 3 <NA> 74.30000 168.7891 0.90000 library(corrr) fashion(d) #> gender age height fte #> 1 Male 188.00 #> 2 Female 28.11 .78 #> 3 74.30 168.79 .90
But how does it work and what does it do?
The inspiration: correlations and decimals
The insipration for fashion()
came from my unending frustration at getting a correlation matrix to print out exactly how I wanted. For example, printing correlations typically looks something like:
mtcars %>% correlate() #> # A tibble: 11 x 12 #> rowname mpg cyl disp hp drat #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 mpg NA -0.8521620 -0.8475514 -0.7761684 0.68117191 #> 2 cyl -0.8521620 NA 0.9020329 0.8324475 -0.69993811 #> 3 disp -0.8475514 0.9020329 NA 0.7909486 -0.71021393 #> 4 hp -0.7761684 0.8324475 0.7909486 NA -0.44875912 #> 5 drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 NA #> 6 wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 #> 7 qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 #> 8 vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 #> 9 am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 #> 10 gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 #> 11 carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 #> # ... with 6 more variables: wt <dbl>, qsec <dbl>, vs <dbl>, am <dbl>, #> # gear <dbl>, carb <dbl>
But this is just plain ugly. Personally, I wanted:
- Decimal places rounded to the same length (usually 2)
- All the leading zeros removed, but keeping the decimal aligned with/without
-
for negative numbers. - Missing values (
NA
) to appear empty (""
).
This is exactly what fashion does:
mtcars %>% correlate() %>% fashion() #> rowname mpg cyl disp hp drat wt qsec vs am gear carb #> 1 mpg -.85 -.85 -.78 .68 -.87 .42 .66 .60 .48 -.55 #> 2 cyl -.85 .90 .83 -.70 .78 -.59 -.81 -.52 -.49 .53 #> 3 disp -.85 .90 .79 -.71 .89 -.43 -.71 -.59 -.56 .39 #> 4 hp -.78 .83 .79 -.45 .66 -.71 -.72 -.24 -.13 .75 #> 5 drat .68 -.70 -.71 -.45 -.71 .09 .44 .71 .70 -.09 #> 6 wt -.87 .78 .89 .66 -.71 -.17 -.55 -.69 -.58 .43 #> 7 qsec .42 -.59 -.43 -.71 .09 -.17 .74 -.23 -.21 -.66 #> 8 vs .66 -.81 -.71 -.72 .44 -.55 .74 .17 .21 -.57 #> 9 am .60 -.52 -.59 -.24 .71 -.69 -.23 .17 .79 .06 #> 10 gear .48 -.49 -.56 -.13 .70 -.58 -.21 .21 .79 .27 #> 11 carb -.55 .53 .39 .75 -.09 .43 -.66 -.57 .06 .27
And if I want to change the number of decimal
places and have a different place holder for NA
values (na_print
):
mtcars %>% correlate() %>% fashion(decimals = 1, na_print = "x") #> rowname mpg cyl disp hp drat wt qsec vs am gear carb #> 1 mpg x -.9 -.8 -.8 .7 -.9 .4 .7 .6 .5 -.6 #> 2 cyl -.9 x .9 .8 -.7 .8 -.6 -.8 -.5 -.5 .5 #> 3 disp -.8 .9 x .8 -.7 .9 -.4 -.7 -.6 -.6 .4 #> 4 hp -.8 .8 .8 x -.4 .7 -.7 -.7 -.2 -.1 .7 #> 5 drat .7 -.7 -.7 -.4 x -.7 .1 .4 .7 .7 -.1 #> 6 wt -.9 .8 .9 .7 -.7 x -.2 -.6 -.7 -.6 .4 #> 7 qsec .4 -.6 -.4 -.7 .1 -.2 x .7 -.2 -.2 -.7 #> 8 vs .7 -.8 -.7 -.7 .4 -.6 .7 x .2 .2 -.6 #> 9 am .6 -.5 -.6 -.2 .7 -.7 -.2 .2 x .8 .1 #> 10 gear .5 -.5 -.6 -.1 .7 -.6 -.2 .2 .8 x .3 #> 11 carb -.6 .5 .4 .7 -.1 .4 -.7 -.6 .1 .3 x
Look but don’t touch
There’s a little bit of magic going on here, but the point to know is that fashion()
is returning a noquote version of the original structure:
mtcars %>% correlate() %>% fashion() %>% class() #> [1] "data.frame" "noquote"
That means that numbers are no longer numbers.
mtcars %>% correlate() %>% sapply(is.numeric) #> rowname mpg cyl disp hp drat wt qsec vs #> FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE #> am gear carb #> TRUE TRUE TRUE mtcars %>% correlate() %>% fashion() %>% sapply(is.numeric) #> rowname mpg cyl disp hp drat wt qsec vs #> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE #> am gear carb #> FALSE FALSE FALSE
Similarly, missing values are no longer missing values.
mtcars %>% correlate() %>% sapply(function(i) sum(is.na(i))) #> rowname mpg cyl disp hp drat wt qsec vs #> 0 1 1 1 1 1 1 1 1 #> am gear carb #> 1 1 1 mtcars %>% correlate() %>% fashion() %>% sapply(function(i) sum(is.na(i))) #> rowname mpg cyl disp hp drat wt qsec vs #> 0 0 0 0 0 0 0 0 0 #> am gear carb #> 0 0 0
So fashion()
is for looking at output, not for continuing to work with it.
What to use it on
fashion()
can be used on most standard R structures such as scalars, vectors, matrices, data frames, etc:
fashion(10.277) #> [1] 10.28 fashion(c(10.3785, NA, 87)) #> [1] 10.38 87.00 fashion(matrix(1:4, nrow = 2)) #> V1 V2 #> 1 1.00 3.00 #> 2 2.00 4.00
You can also use it on non-numeric data. In this case, all fashion()
will do is convert the data to characters, and then alter missing values:
fashion("Hello") #> [1] Hello fashion(c("Hello", NA), na_print = "World") #> [1] Hello World
Now is a good time to take a look back at the opening example to see that it works on a data frame and with a factor column.
Exporting
Don’t forget that it’s easy to export your fashioned output with something like:
my_data %>% fashion() %>% write.csv("fashioned_file.csv")
So what are you waiting for? Go forth and fashion()
!
Sign off
Thanks for reading and I hope this was useful for you.
For updates of recent blog posts, follow @drsimonj on Twitter, or email me at drsimonjackson@gmail.com to get in touch.
If you’d like the code that produced this blog, check out the blogR GitHub repository.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.