Statistics Sunday: My 2019 Reading

Posted on May 3, 2020 by Unknown in R bloggers | 0 Comments

[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve spent the month of April blogging my way through the tidyverse, while using my reading dataset from 2019 as the example. Today, I thought I’d bring many of those analyses and data manipulation techniques together to do a post about my reading habits for the year.

library(tidyverse)

## -- Attaching packages ------------------------------------------- tidyverse 1.3.0 --

## <U+2713> ggplot2 3.2.1     <U+2713> purrr   0.3.3
## <U+2713> tibble  2.1.3     <U+2713> dplyr   0.8.3
## <U+2713> tidyr   1.0.0     <U+2713> stringr 1.4.0
## <U+2713> readr   1.3.1     <U+2713> forcats 0.4.0

## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

reads2019 <- read_csv("~/Downloads/Blogging A to Z/SaraReads2019_allchanges.csv",
                      col_names = TRUE)

## Parsed with column specification:
## cols(
##   Title = col_character(),
##   Pages = col_double(),
##   date_started = col_character(),
##   date_read = col_character(),
##   Book.ID = col_double(),
##   Author = col_character(),
##   AdditionalAuthors = col_character(),
##   AverageRating = col_double(),
##   OriginalPublicationYear = col_double(),
##   read_time = col_double(),
##   MyRating = col_double(),
##   Gender = col_double(),
##   Fiction = col_double(),
##   Childrens = col_double(),
##   Fantasy = col_double(),
##   SciFi = col_double(),
##   Mystery = col_double(),
##   SelfHelp = col_double()
## )

As you recall, I read 87 books last year, by 42 different authors.

reads2019 %>%
  summarise(Books = n(),
            Authors = n_distinct(Author))

## # A tibble: 1 x 2
##   Books Authors
##   <int>   <int>
## 1    87      42

Using summarise, we can get some basic information about each author.

authors <- reads2019 %>%
  group_by(Author) %>%
  summarise(Books = n(),
            Pages = sum(Pages),
            AvgRating = mean(MyRating),
            Oldest = min(OriginalPublicationYear),
            Newest = max(OriginalPublicationYear),
            AvgRT = mean(read_time),
            Gender = first(Gender),
            Fiction = sum(Fiction),
            Childrens = sum(Childrens),
            Fantasy = sum(Fantasy),
            Sci = sum(SciFi),
            Mystery = sum(Mystery))

Let's plot number of books by each author, with the bars arranged by number of books.

authors %>%
  ggplot(aes(reorder(Author, desc(Books)), Books)) +
  geom_col() +
  theme(axis.text.x = element_text(angle = 90)) +
  xlab("Author")

I could simplify this chart quite a bit by only showing authors with 2 or more books in the set, and also by flipping the axes so author can be read along the side.

authors %>%
  mutate(Author = fct_reorder(Author, desc(Author))) %>%
  filter(Books > 1) %>%
  ggplot(aes(reorder(Author, Books), Books)) +
  geom_col() +
  coord_flip() +
  xlab("Author")

genre <- reads2019 %>% group_by(Fiction, Childrens, Fantasy, SciFi, Mystery) %>% summarise(Books = n(), AvgRating = mean(MyRating)) %>% bind_cols(Genre = c("Non-Fiction", "General Fiction", "Mystery", "Science Fiction", "Fantasy", "Fantasy Sci-Fi", "Children's Fiction", "Children's Fantasy")) genre %>% ggplot(aes(reorder(Genre, desc(AvgRating)), AvgRating)) + geom_col() + scale_x_discrete(labels=function(x){sub("\\s", "\n", x)}) + xlab("Genre") + ylab("Average Rating")

lowratings <- reads2019 %>% filter(MyRating <= 3) %>% mutate(Rating = case_when(MyRating == 2 ~ "Hated", MyRating == 3 ~ "Disliked")) %>% arrange(desc(MyRating), Author) %>% select(Title, Author, Rating) library(expss) ## ## Attaching package: 'expss' ## The following objects are masked from 'package:stringr': ## ## fixed, regex ## The following objects are masked from 'package:dplyr': ## ## between, compute, contains, first, last, na_if, recode, vars ## The following objects are masked from 'package:purrr': ## ## keep, modify, modify_if, transpose ## The following objects are masked from 'package:tidyr': ## ## contains, nest ## The following object is masked from 'package:ggplot2': ## ## vars as.etable(lowratings, rownames_as_row_labels = FALSE)

Title	Author	Rating
The Scarecrow of Oz (Oz, #9)	Baum, L. Frank	Disliked
The Tin Woodman of Oz (Oz, #12)	Baum, L. Frank	Disliked
Herself Surprised	Cary, Joyce	Disliked
The 5 Love Languages: The Secret to Love That Lasts	Chapman, Gary	Disliked
Boundaries: When to Say Yes, How to Say No to Take Control of Your Life	Cloud, Henry	Disliked
Summerdale	Collins, David Jay	Disliked
When We Were Orphans	Ishiguro, Kazuo	Disliked
Bird Box (Bird Box, #1)	Malerman, Josh	Disliked
Oz in Perspective: Magic and Myth in the L. Frank Baum Books	Tuerk, Richard	Disliked
Cujo	King, Stephen	Hated
Just Evil (Evil Secrets Trilogy, #1)	McKeehan, Vickie	Hated

Title

Author

Rating

The Scarecrow of Oz (Oz, #9)

Baum, L. Frank

Disliked

The Tin Woodman of Oz (Oz, #12)

Baum, L. Frank

Disliked

Herself Surprised

Cary, Joyce

Disliked

The 5 Love Languages: The Secret to Love That Lasts

Chapman, Gary

Disliked

Boundaries: When to Say Yes, How to Say No to Take Control of Your Life

Cloud, Henry

Disliked

Summerdale

Collins, David Jay

Disliked

When We Were Orphans

Ishiguro, Kazuo

Disliked

Bird Box (Bird Box, #1)

Malerman, Josh

Disliked

Oz in Perspective: Magic and Myth in the L. Frank Baum Books

Tuerk, Richard

Disliked

Cujo

King, Stephen

Hated

Just Evil (Evil Secrets Trilogy, #1)

McKeehan, Vickie

Hated

reads2019 <- reads2019 %>% mutate(MyRating = replace(MyRating, MyRating == 2, 1), MyRating = replace(MyRating, Title == "Herself Surprised", 2)) lowratings <- reads2019 %>% filter(MyRating <= 2) %>% mutate(Rating = case_when(MyRating == 1 ~ "Hated", MyRating == 2 ~ "Disliked")) %>% arrange(desc(MyRating), Author) %>% select(Title, Author, Rating) library(expss) as.etable(lowratings, rownames_as_row_labels = FALSE)

Title	Author	Rating
Herself Surprised	Cary, Joyce	Disliked
Cujo	King, Stephen	Hated
Just Evil (Evil Secrets Trilogy, #1)	McKeehan, Vickie	Hated

Title

Author

Rating

Herself Surprised

Cary, Joyce

Disliked

Cujo

King, Stephen

Hated

Just Evil (Evil Secrets Trilogy, #1)

McKeehan, Vickie

Hated

topbygenre <- reads2019 %>% left_join(genre, by = c("Fiction","Childrens","Fantasy","SciFi","Mystery")) %>% select(-Books, -AvgRating) %>% filter(MyRating == 5) topbygenre %>% ggplot(aes(fct_infreq(Genre))) + geom_bar() + scale_x_discrete(labels=function(x){sub("\\s", "\n", x)}) + xlab("Genre") + ylab("Books")

genre %>% ggplot(aes(reorder(Genre, desc(AvgRating)), AvgRating, label = Books)) + geom_col() + scale_x_discrete(labels=function(x){sub("\\s", "\n", x)}) + xlab("Genre") + ylab("Average Rating") + geom_text(aes(x = Genre, y = AvgRating-0.25), size = 5, color = "white")

genre %>% filter(Books > 2) %>% ggplot(aes(reorder(Genre, desc(AvgRating)), AvgRating, label = Books)) + geom_col() + scale_x_discrete(labels=function(x){sub("\\s", "\n", x)}) + xlab("Genre") + ylab("Average Rating") + geom_text(aes(x = Genre, y = AvgRating-0.25), size = 5, color = "white")