F is for filter
[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
For the letter F – filters! Filters are incredibly useful, especially when combined with the main pipe %>%. I frequently use filters along with ggplot functions, to chart a specific subgroup or remove missing cases or outliers. As one example, I could use a filter to chart only fiction books from my reading dataset.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
library(tidyverse) ## -- Attaching packages ------------------------------------------- tidyverse 1.3.0 -- ## <U+2713> ggplot2 3.2.1 <U+2713> purrr 0.3.3 ## <U+2713> tibble 2.1.3 <U+2713> dplyr 0.8.3 ## <U+2713> tidyr 1.0.0 <U+2713> stringr 1.4.0 ## <U+2713> readr 1.3.1 <U+2713> forcats 0.4.0 ## -- Conflicts ---------------------------------------------- tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() reads2019 <- read_csv("~/Downloads/Blogging A to Z/SarasReads2019_allrated.csv", col_names = TRUE) ## Parsed with column specification: ## cols( ## Title = col_character(), ## Pages = col_double(), ## date_started = col_character(), ## date_read = col_character(), ## Book.ID = col_double(), ## Author = col_character(), ## AdditionalAuthors = col_character(), ## AverageRating = col_double(), ## OriginalPublicationYear = col_double(), ## read_time = col_double(), ## MyRating = col_double(), ## Gender = col_double(), ## Fiction = col_double(), ## Childrens = col_double(), ## Fantasy = col_double(), ## SciFi = col_double(), ## Mystery = col_double(), ## SelfHelp = col_double() ## ) reads2019 %>% filter(Fiction == 1) %>% ggplot(aes(Pages)) + geom_histogram() + scale_y_continuous(breaks = seq(0,16,1)) + scale_x_continuous(breaks = seq(0,1200,100)) + ylab("Frequency") + theme_classic() ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.I could also use filters to create a new dataset – perhaps one of my top books I read during 2019.
library(magrittr) ## ## Attaching package: 'magrittr' ## The following object is masked from 'package:purrr': ## ## set_names ## The following object is masked from 'package:tidyr': ## ## extract top_books <- reads2019 %>% filter(MyRating == 5) top_books %$% list(Title) ## [[1]] ## [1] "1Q84" ## [2] "Alas, Babylon" ## [3] "Elevation" ## [4] "Guards! Guards! (Discworld, #8; City Watch #1)" ## [5] "How Music Works" ## [6] "Lords and Ladies (Discworld, #14; Witches #4)" ## [7] "Moving Pictures (Discworld, #10; Industrial Revolution, #1)" ## [8] "Redshirts" ## [9] "Swarm Theory" ## [10] "The Android's Dream (The Android's Dream #1)" ## [11] "The Dutch House" ## [12] "The Emerald City of Oz (Oz #6)" ## [13] "The End of Mr. Y" ## [14] "The Human Division (Old Man's War, #5)" ## [15] "The Last Colony (Old Man's War, #3)" ## [16] "The Long Utopia (The Long Earth #4)" ## [17] "The Marvelous Land of Oz (Oz, #2)" ## [18] "The Miraculous Journey of Edward Tulane" ## [19] "The Night Circus" ## [20] "The Patchwork Girl of Oz (Oz, #7)" ## [21] "The Patron Saint of Liars" ## [22] "The Wonderful Wizard of Oz (Oz, #1)" ## [23] "The Year of the Flood (MaddAddam, #2)" ## [24] "Witches Abroad (Discworld, #12; Witches #3)" ## [25] "Wyrd Sisters (Discworld, #6; Witches #2)"Or I could create one of the 10 longest books I read:
long_books <- reads2019 %>% arrange(desc(Pages)) %>% filter(between(row_number(), 1, 10)) %>% select(Title, Pages) library(expss) ## ## Use 'expss_output_viewer()' to display tables in the RStudio Viewer. ## To return to the console output, use 'expss_output_default()'. ## ## Attaching package: 'expss' ## The following objects are masked from 'package:magrittr': ## ## and, equals, or ## The following objects are masked from 'package:stringr': ## ## fixed, regex ## The following objects are masked from 'package:dplyr': ## ## between, compute, contains, first, last, na_if, recode, vars ## The following objects are masked from 'package:purrr': ## ## keep, modify, modify_if, transpose ## The following objects are masked from 'package:tidyr': ## ## contains, nest ## The following object is masked from 'package:ggplot2': ## ## vars as.etable(long_books, rownames_as_row_labels = FALSE)
Title | Pages |
---|---|
It | 1156 |
1Q84 | 925 |
Insomnia | 890 |
The Institute | 576 |
The Robber Bride | 528 |
Life of Pi | 460 |
Cell | 449 |
Cujo | 432 |
The Human Division (Old Man’s War, #5) | 431 |
The Year of the Flood (MaddAddam, #2) | 431 |