{emayili}: Rudimentary Email Address Validation
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A recent issue on the {emayili} GitHub repository prompted me to think a bit more about email address validation. When I started looking into this I was somewhat surprised to learn that it’s such a complicated problem. Who would have thought that something as apparently simple as an email address could be linked with such complexity?
There are two separate problems:
- Is the email address valid or authentic? Does it work?
- Is the email address syntactically correct? Does it obey the rules?
We’ll be focusing on the second problem. But before we delve into that, let’s load {emayili}
.
library(emayili)
Does It Work?
If you want to validate an email address, send it an email. Problem solved. There’s only ONE way to validate an email address
The most effective means to answer this question is to send a test message to the address and potentially request a response. There are a few possible outcomes, some of which are:
- The test message bounces. It’s rejected by the server, so the mailbox doesn’t exist?
- The test message appears to be delivered but there’s no response. Perhaps the mailbox exists, but it’s unused or a throwaway.
- The test message is delivered and there’s a response. It exists! ?
Is it Syntactically Correct?
Not surprisingly, there are rules which dictate what constitutes a syntactically correct email address. Before we think about the rules, let’s talk about the parts of an email address.
An email address consists of two parts: the local part and the domain. For example, [email protected]
. An optional display name may precede the email address, in which case the email address is enclosed in angle brackets. For example, Alice Jones <[email protected]>
.
There are various rules which apply to the local part and the domain, the most important (IMHO) of which are summarised below. Note: This is a high level summary and neglects some of the nuances.
Rules: Local Part
Rules for the local part of an email address:
- May be up to 64 characters long.
- May be quoted (like
"alice"
) or unquoted (justalice
). - If unquoted may consist of the following ASCII characters:
- lower and uppercase Latin letters (
a
toz
andA
toZ
) - digits (
0
to9
) - various printable punctuation marks and
- dots (
.
) but not at the beginning or end or more than one in succession. - If quoted then the rules are a lot more liberal.
- May contain comments (which are enclosed in parentheses).
Rules: Domain
Rules for the domain of an email address:
- May be up to 255 characters long.
- Must satisfy the requirements of a hostname or IP address (in which case it must be enclosed in square brackets).
- May contain comments (which are enclosed in parentheses).
The address
Class
An address
class has been added to emayili
.
alice <- address("[email protected]") (bob <- address(email = "[email protected]", display = "Robert Brown")) [1] "Robert Brown <[email protected]>"
You can construct address
objects from local name and domain.
address(local = "alice", domain = "example.com") [1] "[email protected]"
It’s vectorised and does recycling, so you can also do this sort of thing:
address(local = c("bob", "erin"), domain = "yahoo.co.uk") [1] "[email protected]" "[email protected]"
This also works well in a pipeline. First let’s set up a tibble
with the details of a few email accounts.
recipients <- tibble( email = c(NA, NA, NA, "[email protected]"), local = c("alice", "erin", "bob", NA), domain = c("example.com", "yahoo.co.uk", "yahoo.co.uk", NA), display = c(NA, NA, "Robert Brown", NA) ) recipients # A tibble: 4 × 4 email local domain display <chr> <chr> <chr> <chr> 1 <NA> alice example.com <NA> 2 <NA> erin yahoo.co.uk <NA> 3 <NA> bob yahoo.co.uk Robert Brown 4 [email protected] <NA> <NA> <NA>
Now use invoke()
to call address()
for each record.
library(purrr) recipients <- recipients %>% invoke(address, .) [1] "[email protected]" "[email protected]" [3] "Robert Brown <[email protected]>" "[email protected]"
? This could equally be done with do.call()
.
Email addresses can also be coerced into address
objects using as.address()
.
as.address("Robert <[email protected]>") [1] "Robert <[email protected]>"
Methods
There are methods for extracting the email address and display name.
raw(bob) [1] "[email protected]" display(bob) [1] "Robert Brown"
You can also get the local part and the domain.
local(bob) [1] "bob" domain(bob) [1] "yahoo.co.uk"
Parties
There’s also a new function, parties()
, for extracting the addresses associated with an email.
email <- envelope() %>% from("[email protected]") %>% to("[email protected]", "Robert <[email protected]>") %>% cc("[email protected]") %>% bcc("[email protected]") parties(email) # A tibble: 5 × 6 type address display raw local domain <chr> <vctrs_dd> <chr> <chr> <chr> <chr> 1 From [email protected] <NA> [email protected] alice example.com 2 To [email protected] <NA> [email protected] erin yahoo.co.uk 3 To Robert <[email protected]> Robert [email protected] bob yahoo.co.uk 4 Cc [email protected] <NA> [email protected] oscar windmill.nl 5 Bcc [email protected] <NA> [email protected] olivia hotmail.com
The details of all of the addresses on the email, broken down in a nice tidy format.
Normalisation
Sometimes email address data can be dirty, so the address
class tries to sanitise its contents.
as.address(" Robert < bob @ yahoo.co.uk >") [1] "Robert <[email protected]>"
This is very simple at the moment, but I’m planning on adding more functionality.
Compliance
Finally, a test of whether an email address complies with the syntax rules.
First some good addresses.
compliant(c( "[email protected]", "Robert <[email protected]>", "[email protected]" )) [1] TRUE TRUE TRUE
Now some evil addresses.
compliant(c( "alice?example.com", "Robert [email protected]", "olivia@hot_mail.com" )) [1] FALSE FALSE FALSE
The implementation of compliant()
uses regular expressions. Take a look at this StackOverflow thread for a lengthy discussion on the use of REGEX for checking emails. It’s not perfect, but it’s functional. If you discover cases where it fails, please let me know and I’ll improve the logic.
Conclusion
For the most part you won’t need to worry about these checks. They will all happen in the background when you put together an email. If, however, one of your email addresses is problematic, then you should at least know about it before you send the email.
These updates are available in {emayili}
-0.4.16.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.