O is for order_by
[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This will be a quick post on another tidyverse function, order_by. I’ll admit, I don’t use this one as often as arrange. It can be useful, though, if you don’t want to permanently change the order of your dataset but want to use functions that require ordering the data. One example is the cumulative sum function (cumsum).Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
library(tidyverse) ## -- Attaching packages -------------------------------------------------------------------------------- tidyverse 1.3.0 -- ## <U+2713> ggplot2 3.2.1 <U+2713> purrr 0.3.3 ## <U+2713> tibble 2.1.3 <U+2713> dplyr 0.8.3 ## <U+2713> tidyr 1.0.0 <U+2713> stringr 1.4.0 ## <U+2713> readr 1.3.1 <U+2713> forcats 0.4.0 ## -- Conflicts ----------------------------------------------------------------------------------- tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() reads2019 <- read_csv("~/Downloads/Blogging A to Z/SaraReads2019_allrated.csv", col_names = TRUE) ## Parsed with column specification: ## cols( ## Title = col_character(), ## Pages = col_double(), ## date_started = col_character(), ## date_read = col_character(), ## Book.ID = col_double(), ## Author = col_character(), ## AdditionalAuthors = col_character(), ## AverageRating = col_double(), ## OriginalPublicationYear = col_double(), ## read_time = col_double(), ## MyRating = col_double(), ## Gender = col_double(), ## Fiction = col_double(), ## Childrens = col_double(), ## Fantasy = col_double(), ## SciFi = col_double(), ## Mystery = col_double(), ## SelfHelp = col_double() ## )In my reading dataset, I could compute a cumulative sum of my days reading, then plot that against day of the year (days numbered from 1 to 365), to see where I might have had gaps in my reading habits.
library(lubridate) ## ## Attaching package: 'lubridate' ## The following object is masked from 'package:base': ## ## date reads2019 <- reads2019 %>% mutate(date_started = as.Date(reads2019$date_started, format = '%m/%d/%Y'), date_read = as.Date(date_read, format = '%m/%d/%Y'), days_reading = order_by(date_read, cumsum(read_time)), DOY = yday(date_read))A quick look at my data shows one issue though - my first book read in 2019 was actually started in 2018, so the days spent reading is greater than the numbered day of the year. I should probably correct that, then recompute my days_reading variable.
reads2019 <- reads2019 %>% mutate(read_time = ifelse(read_time > DOY, DOY, read_time), days_reading = order_by(date_read, cumsum(read_time)))Now I can plot days reading against day of the year.
reads2019 %>% ggplot(aes(DOY, days_reading)) + geom_point() + scale_x_continuous(breaks = seq(0, 380, 20)) + scale_y_continuous(breaks = seq(0, 360, 20)) + xlab("Day of Year") + ylab("Days Reading")
Another potential use for order_by is with the fill function, which fills missing values from the next or previous column entry. If you were doing a study where people receive repeated measures and miss one of the measurements, or if your dataset only notes a new value when the previous one has changed, you could use fill to copy previous values into the empty cells.
Tomorrow, we'll talk more about the scales package, to enhance your ggplots!
To leave a comment for the author, please follow the link and comment on their blog: Deeply Trivial.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.