Site icon R-bloggers

XKCD-Gutenberg Passwords

[This article was first published on R on jmarriott.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • I have been inspired by this informative XKCD comic on password security, and I often follow its advice by using random-word-generating websites. But I have to wonder what dictionaries these sites use and how random the words are that they spit out. So I thought that it would be fun to make my own generator using books from https://www.gutenberg.org/ as my dictionaries, that way I would at least know where they are coming from. As long as I chose books somewhat randomly, I think that should be pretty secure.

    First I will load the required packages.

    library(tidyverse)
    library(gutenbergr)
    library(tidytext)

    Since Pride and Prejudice is currently the most downloaded book on gutenberg over the past 30 days and I happen to like it myself, I’ll use that as my example book. It is easy to download the full text with the gutenbergr package.

    book_text <- gutenberg_download(1342)
    
    data("stop_words") # load stop words
    
    book_text %>%
      unnest_tokens(word, text) %>% # turn the text into a single column of words
      mutate(word = str_extract(string = word, pattern = "[[:alpha:]]+")) %>% # remove any non-alphanumeric characters 
      select(word) %>% # get rid of the extra columns
      unique() %>% # get rid of duplicate words
      anti_join(stop_words, by = "word") %>% # get rid of boring "stop" words
      drop_na() %>% # drop anything that didn't make it through cleanly
      unlist() %>% # turn the column into a vector that sample() knows what to do with
      sample(4) # chose four words at random
    ##        word273       word4600        word632       word4122 
    ##       "choose"      "content" "circumstance"      "mingled"

    Well that was super easy, wasn’t it? The only thing that isn’t easy with this setup is looking up a book to use. But that can readily be solved with a shiny app. You can see the code below (at the time this post was knit—the current code will always be on github here), and the live app is at https://jameson-marriott.shinyapps.io/Password_App/.

    library(shiny)
    library(shinythemes)
    library(gutenbergr)
    library(dplyr)
    library(tidyr)
    library(tidytext)
    library(stringr)
    library(rclipboard)
    
    # get all the titles for the drop-down menu
    titles <- gutenberg_works(only_text = TRUE, distinct = TRUE) %>%
        select(title) %>%
        drop_na()
    
    # load the stop words so that we don't have to reload it later
    data("stop_words")
    
    ui <- fluidPage(theme = shinytheme("cerulean"),
                    
                    rclipboardSetup(), # what it sounds like
                    
        verticalLayout(
            fluidRow(
                column(width = 8, offset = 1,
                       titlePanel(title = "XKCD-Inspired, Gutenberg-Sourced Passwords"),
                       p("This web-app lets you generate passwords inspired by ",
                          a(href = "https://xkcd.com/936/", "this xkcd comic."),
                          br(),
                          "First select a book from ",
                          a(href = "https://www.gutenberg.org/", "Project Gutenberg"),
                          " and then chose the number of words you want to use from that book for your password.")
                       ),
            ),
            fluidRow(
                column(width = 6, offset = 1,
                       selectizeInput(inputId = "book_title", 
                                      label = "Book Title",
                                      choices = c("Chose one" = "", titles), # removes the default selection, but needs error handling for the down-stream items
                                      selected = NULL),
                                      #choices = titles,
                                      #selected = "Pride and Prejudice"),
                       p(textOutput("book_length")),
                       sliderInput("number_of_words",
                                   "Number of words to chose",
                                   min = 1,
                                   max = 10,
                                   value = 4))
            ),
    
            # Show the password
            fluidRow(
                column(width = 6, offset = 1,
                       tags$hr(),
                       textOutput("password", container = tags$strong)
                ),
            ),
            # Show the password without spaces
            fluidRow(
                column(width = 6, offset = 1,
                       uiOutput("password_no_spaces"))
            )
        )
    )
    
    server <- function(input, output) {
        
        # get the book
        gutenberg_book <- reactive({
            validate(
                need(input$book_title != "", "Please chose a book.")
            )
            gutenberg_works(title == input$book_title) %>% # get the gutenberg id
                gutenberg_download() %>% 
                unnest_tokens(word, text) %>% # turn the text into a single column of words
                mutate(word = str_extract(string = word, pattern = "[[:alpha:]]+")) %>% # remove any non-alphanumeric characters. 
                select(word) %>% # get rid of the extra columns
                unique() %>% # get rid of duplicate words
                anti_join(stop_words, by = "word") %>% # get rid of boring, "stop" words
                drop_na() %>% # drop anything that didn't make it through cleanly
                unlist()
        })
        
        # Report the number of unique words in the book
        output$book_length <- renderText({
            length <- gutenberg_book() %>%
                length() %>%
                format(big.mark = ",") # Add some nice formating
            
            paste0("There are ", length, " unique words in this book (including diffent forms of the same word).")
        })
         
        # Generate the actual password from the book
        password <- reactive({
            validate(
                need(input$book_title != "", "")
            )
            gutenberg_book() %>%
                sample(input$number_of_words) %>% # chose words at random
                paste0() # drop the names
        })
        
        # Output the password for the UI
        output$password <- renderText({
            password()
        })
        
        # Make the button to copy the password to the clipboard
        output$password_no_spaces <- renderUI({
            rclipButton("clip_button", paste0("Copy \"", str_flatten(password()), "\""), str_flatten(password()))
        })
    }
    
    # Run the application 
    shinyApp(ui = ui, server = server)

    To leave a comment for the author, please follow the link and comment on their blog: R on jmarriott.com.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.