Examining Email Addresses in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I don’t normally work with personal identifiable information such as emails. However, the recent data dump from Ashley Madison got me thinking about how I’d examine a data set composed of email addresses. What are the characteristics of an email that I’d look to extract? How would I perform that task in R? Here’s some quick R code to extract the host, address type, and other information from a set of email strings. From there, we can obviously summarize the data according to a number of desired email characteristics.
df = data.frame(email = c("[email protected]","[email protected]","[email protected]","[email protected]","[email protected]", "[email protected]","[email protected]","[email protected]","[email protected]","[email protected]")) df$one <- sub("@.*$", "", df$email ) df$two <- sub('.*@', '', df$email ) df$three <- sub('.*\.', '', df$email ) num <- c(0:9); num num_match <- str_c(num, collapse = "|"); num_match df$num_yn <- as.numeric(str_detect(df$email, num_match)) und <- c("_"); und und_match <- str_c(und, collapse = "|"); und_match df$und_yn <- as.numeric(str_detect(df$email, und_match)) > df email one two three num_yn und_yn 1 [email protected] one gkn.com com 0 0 2 [email protected] two132 wern.com com 1 0 3 [email protected] three fu.com com 0 0 4 [email protected] four huo.com com 0 0 5 [email protected] five hoi.net net 0 0 6 [email protected] ten hoinse.com com 0 0 7 [email protected] four99 huo.com com 1 0 8 [email protected] two wern.gov gov 0 0 9 [email protected] f_ive hoi.com com 0 1 10 [email protected] six ihoio.gov gov 0 0
What about you? If you regularly work with email addresses and have some useful insights for the rest of us, please leave a comment below. How do you usually attack a data set where it’s just a large number of email addresses?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.