Statistics Sunday: My 2019 Reading
[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve spent the month of April blogging my way through the tidyverse, while using my reading dataset from 2019 as the example. Today, I thought I’d bring many of those analyses and data manipulation techniques together to do a post about my reading habits for the year.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
library(tidyverse)
## -- Attaching packages ------------------------------------------- tidyverse 1.3.0 --
## <U+2713> ggplot2 3.2.1 <U+2713> purrr 0.3.3
## <U+2713> tibble 2.1.3 <U+2713> dplyr 0.8.3
## <U+2713> tidyr 1.0.0 <U+2713> stringr 1.4.0
## <U+2713> readr 1.3.1 <U+2713> forcats 0.4.0
## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
reads2019 <- read_csv("~/Downloads/Blogging A to Z/SaraReads2019_allchanges.csv",
col_names = TRUE)
## Parsed with column specification:
## cols(
## Title = col_character(),
## Pages = col_double(),
## date_started = col_character(),
## date_read = col_character(),
## Book.ID = col_double(),
## Author = col_character(),
## AdditionalAuthors = col_character(),
## AverageRating = col_double(),
## OriginalPublicationYear = col_double(),
## read_time = col_double(),
## MyRating = col_double(),
## Gender = col_double(),
## Fiction = col_double(),
## Childrens = col_double(),
## Fantasy = col_double(),
## SciFi = col_double(),
## Mystery = col_double(),
## SelfHelp = col_double()
## )
As you recall, I read 87 books last year, by 42 different authors.reads2019 %>%
summarise(Books = n(),
Authors = n_distinct(Author))
## # A tibble: 1 x 2
## Books Authors
## <int> <int>
## 1 87 42
Using summarise, we can get some basic information about each author.authors <- reads2019 %>%
group_by(Author) %>%
summarise(Books = n(),
Pages = sum(Pages),
AvgRating = mean(MyRating),
Oldest = min(OriginalPublicationYear),
Newest = max(OriginalPublicationYear),
AvgRT = mean(read_time),
Gender = first(Gender),
Fiction = sum(Fiction),
Childrens = sum(Childrens),
Fantasy = sum(Fantasy),
Sci = sum(SciFi),
Mystery = sum(Mystery))
Let's plot number of books by each author, with the bars arranged by number of books.authors %>%
ggplot(aes(reorder(Author, desc(Books)), Books)) +
geom_col() +
theme(axis.text.x = element_text(angle = 90)) +
xlab("Author")
authors %>%
mutate(Author = fct_reorder(Author, desc(Author))) %>%
filter(Books > 1) %>%
ggplot(aes(reorder(Author, Books), Books)) +
geom_col() +
coord_flip() +
xlab("Author")






