Site icon R-bloggers

Statistics Sunday: What Should I Read Next?

[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When You Need a New Book to Read I log all of my books on Goodreads. On top of that, whenever I hear about a new book I have to read, I add it on Goodreads, so I remember it. Of course, this means my Goodreads bookshelves are a little out of control. Fortunately, I can use R to dig through my Goodreads to-read shelf and figure out the next book to read and/or buy.

If you’re on Goodreads, you can easily download your entire bookshelf, including your to-read books, by going to “My Books” then clicking “Import and Export”. On the right side of the screen will be a link for “Export Library”. Click that and give it a minute (or several). Soon, a link will appear to download your entire library in a CSV file. You can then bring that into R.

If I’m ever stuck for the next book to read, I can use this file to randomly select a book from my to-read list to check out next. Because I own a lot of books on my to-read list, I’d like to filter that dataset to only include books I own. (Note: You can add books to your “owned” list by clicking on “My Books” then “Owned Books” to select which books you already have in your library. Otherwise, you can keep running the sample function until you get a book you already own or have ready access to. You’d just want to skip the “reading_list” part below.)

setwd("~/Dropbox")
library(tidyverse)

books <- read_csv("goodreads_library_export.csv", col_names = TRUE)

reading_list <- books %>%
  filter(`Owned Copies` == 1, `Exclusive Shelf` == "to-read")
head(reading_list)

## # A tibble: 6 x 31
##   `Book Id` Title    Author  `Author l-f` `Additional Auth… ISBN    ISBN13
##       <int> <chr>    <chr>   <chr>        <chr>             <chr>    <dbl>
## 1  27877138 It       Stephe… King, Steph… <NA>              1501…  9.78e12
## 2     10611 The Eye… Stephe… King, Steph… <NA>              0751…  9.78e12
## 3     11570 Dreamca… Stephe… King, Steph… William Olivier … 2226…  9.78e12
## 4  36452674 The Squ… Kevin … Hearne, Kev… <NA>              <NA>  NA      
## 5  38193271 Bickeri… Mildre… Abbott, Mil… <NA>              <NA>  NA      
## 6  20873740 Sapiens… Yuval … Harari, Yuv… <NA>              <NA>  NA      
## # ... with 24 more variables: `My Rating` <int>, `Average Rating` <dbl>,
## #   Publisher <chr>, Binding <chr>, `Number of Pages` <int>, `Year
## #   Published` <int>, `Original Publication Year` <int>, `Date
## #   Read` <date>, `Date Added` <date>, Bookshelves <chr>, `Bookshelves
## #   with positions` <chr>, `Exclusive Shelf` <chr>, `My Review` <chr>,
## #   Spoiler <chr>, `Private Notes` <chr>, `Read Count` <int>, `Recommended
## #   For` <chr>, `Recommended By` <chr>, `Owned Copies` <int>, `Original
## #   Purchase Date` <chr>, `Original Purchase Location` <chr>,
## #   Condition <chr>, `Condition Description` <chr>, BCID <chr>

Now I have a data frame of books that I own and have not read. This data frame contains 55 books. Drawing a random sample of 1 book is quite easy.

reading_list[sample(1:nrow(reading_list), 1),]

## # A tibble: 1 x 31
##   `Book Id` Title    Author  `Author l-f`  `Additional Auth… ISBN   ISBN13
##       <int> <chr>    <chr>   <chr>         <chr>             <chr>   <dbl>
## 1     14201 Jonatha… Susann… Clarke, Susa… <NA>              0765… 9.78e12
## # ... with 24 more variables: `My Rating` <int>, `Average Rating` <dbl>,
## #   Publisher <chr>, Binding <chr>, `Number of Pages` <int>, `Year
## #   Published` <int>, `Original Publication Year` <int>, `Date
## #   Read` <date>, `Date Added` <date>, Bookshelves <chr>, `Bookshelves
## #   with positions` <chr>, `Exclusive Shelf` <chr>, `My Review` <chr>,
## #   Spoiler <chr>, `Private Notes` <chr>, `Read Count` <int>, `Recommended
## #   For` <chr>, `Recommended By` <chr>, `Owned Copies` <int>, `Original
## #   Purchase Date` <chr>, `Original Purchase Location` <chr>,
## #   Condition <chr>, `Condition Description` <chr>, BCID <chr>

According to this random sample, the next book I should read is Jonathan Strange & Mr Norrell. Now if I’m ever stuck for a book to read, I can use this code to find one. And if I’m in a bookstore, picking up something new – as is often the case, since bookstores are one of my happy places – I can update the code to tell me which book I should buy next.

to_buy <- books %>%
  filter(`Owned Copies` == 0, `Exclusive Shelf` == "to-read")
to_buy[sample(1:nrow(to_buy), 1),]

## # A tibble: 1 x 31
##   `Book Id` Title    Author   `Author l-f` `Additional Aut… ISBN    ISBN13
##       <int> <chr>    <chr>    <chr>        <chr>            <chr>    <dbl>
## 1   2906039 Just Af… Stephen… King, Steph… <NA>             1416…  9.78e12
## # ... with 24 more variables: `My Rating` <int>, `Average Rating` <dbl>,
## #   Publisher <chr>, Binding <chr>, `Number of Pages` <int>, `Year
## #   Published` <int>, `Original Publication Year` <int>, `Date
## #   Read` <date>, `Date Added` <date>, Bookshelves <chr>, `Bookshelves
## #   with positions` <chr>, `Exclusive Shelf` <chr>, `My Review` <chr>,
## #   Spoiler <chr>, `Private Notes` <chr>, `Read Count` <int>, `Recommended
## #   For` <chr>, `Recommended By` <chr>, `Owned Copies` <int>, `Original
## #   Purchase Date` <chr>, `Original Purchase Location` <chr>,
## #   Condition <chr>, `Condition Description` <chr>, BCID <chr>

So next time I’m at a bookstore, which will be tomorrow (since I’ll be hanging out in Evanston for a class at my dance studio and plan to hit up the local Barnes & Noble), I should pick up a copy of Just After Sunset.

If you’re on Goodreads, feel free to add me!

To leave a comment for the author, please follow the link and comment on their blog: Deeply Trivial.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.