Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Update 2018-02-17: The title of this article has changed reflecting new information I have received since publishing. For mor information, I refer to the last paragraph.
A treasure trove of leaked passwords
The API of pwnedpasswords.com is quite remarkable. It not only allows you to fetch the results generally obtained by typing in your e-mail into the browser interface and finding out whether or not you’ve been pwned from the comfort of your shell. It further allows you to very simply check whether a certain password has ever been used in any of the dumps they have, and if so, how often. Since haveibeenpwned.com has collected over 550 millions of these in a multitude of data breaches, odds are your password might be amongst these.
Using the API is straightforward, but the way it is secured so that even pwnedpasswords.com itself doesn’t know precisely which password or even password hash you are checking is ingenious, I highly recommend reading their tutorial. The following R snippet allows for obtaining the number of hits for a vector of passwords:
library(httr) library(digest) library(plyr) library(dplyr) library(stringr) popularity <- function(password){ passw_hash = digest(password, "sha1", serialize = FALSE) passw_front = toupper(substr(passw_hash, 1, 5)) passw_back = toupper(substr(passw_hash, 6, 200)) hashes = read.table(text = content(GET(paste0("https://api.pwnedpasswords.com/range/", passw_front)), encoding = "UTF-8"), sep = ":") %>% rename(hashes = V1, count = V2) %>% filter(hashes == passw_back) %>% mutate(password = password) %>% select(password, count) if(nrow(hashes) == 0){ hashes <- tibble("password" = password, "count" = 0) } return(hashes) }
City names as passwords
Using this function allows us to search through various ranges of passwords. For instance, let’s see how many people have chosen the names of cites in Baden-Württemberg as their passwords1:
City Name | Number of Usages as Password |
---|---|
Freiburg | 3077 |
Stuttgart | 9496 |
Karlsruhe | 1426 |
Heidelberg | 4081 |
Mannheim | 5040 |
Konstanz | 924 |
It seems like the city name “stuttgart” appears most often in the password list, yet that is not incredibly surprising, as it is also the largest city. A plot of the number of password hits in relation to the number of inhabitants2 looks like this:
Quite obviously, the ratio of “number of uses of city name as password” and “number of inhabitants”, i.e. “Use of City Name as Password per Inhabitant” differs from city to city. It seems like this ratio is higher for the cities Heidelberg and Freiburg, each of which is known for a high quality of living and a very picturesque old town. So, let’s look at some international (and especially British) competition:
City Name | Inhabitants | Number of Usages as Password | City Names as Password per 1000 Inhabitants |
---|---|---|---|
Liverpool | 473073 | 280723 | 593.4 |
Manchester | 520215 | 98831 | 190.0 |
Oxford | 161291 | 23069 | 143.0 |
Cambridge | 151832 | 12648 | 83.3 |
Heidelberg | 160601 | 4081 | 25.4 |
London | 8787892 | 196220 | 22.3 |
Mannheim | 307997 | 5040 | 16.4 |
Paris | 2190327 | 28699 | 13.1 |
Berlin | 3613495 | 40952 | 11.3 |
In this longer list, the effect of having a famous football team (Liverpool, Manchester) as well as having a famous university in a small city (Oxford, Cambridge and Heidelberg) becomes obvious. In other words, it’s not so much about how many people live in a certain city, but how many people feel a positive connection to that particular city, by loyality of a sports team or by time spent at the university.
Let me conclude this article by pointing out the obvious question: “Can any city beat Liverpool” in this contest? All my manual samples have so far yielded good results, but nothing close to Liverpools numbers, for instance:
City Name | Inhabitants | Number of Usages as Password | City Names as Password per 1000 Inhabitants |
---|---|---|---|
Liverpool | 473073 | 280723 | 593.4 |
Green Bay | 105139 | 23069 | 143.0 |
Barcelona | 1620805 | 152196 | 129.9 |
Thus, until futher notice, I proclaim hereby that Liverpool is the most popular city in the world (relative to password use per inhabitant).
Update 2018-02-17: Chester has overtaken Liverpool
Actually, the use of Chester as a password beats Liverpool on the same metric. 117,128 occurrences for a population of 118.200, i.e. 990.9 per 1000 population. cc @CheshireLive
— John Murray (@MurrayData) 11. Februar 2019
@MurrayData is correct of course, “chester” has 117128 occurrences in the data base. “Chester” is not only a city name, however, but also a Christian name, and in particular that of the late lead singer of Linkin Park, Chester Bennington. It is plausible that many of these passwords are disconnected from the city itself. Yet the same argument hoilds for soccer teams as I had already pointed out in my original post, thus I opted to ignore these distortions, and I hereby present the new, updated table:
City Name | Inhabitants | Number of Usages as Password | City Names as Password per 1000 Inhabitants |
---|---|---|---|
Chester | 118200 | 117128 | 990.9 |
Liverpool | 473073 | 280723 | 593.4 |
Manchester | 520215 | 98831 | 190.0 |
Oxford | 161291 | 23069 | 143.0 |
Cambridge | 151832 | 12648 | 83.3 |
Heidelberg | 160601 | 4081 | 25.4 |
London | 8787892 | 196220 | 22.3 |
Mannheim | 307997 | 5040 | 16.4 |
Paris | 2190327 | 28699 | 13.1 |
Berlin | 3613495 | 40952 | 11.3 |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.