Maybe monad in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A monad is mysterious entity from the ivory towers of category theory, an idea that turned out to be quite useful in programming. Part of the myth surrounding monads is that as soon as you understand them, you lose the ability to explain the concept. Since I’m not a mathematician, not even a trained programmer, I won’t even try to explain anything. Instead, I just implement a simple monad.
R, being a functional programming language should be able to benefit from this concept. The goal is not to create a library, just to demonstrate monads, and gain a practical understanding of its usefulness.
So, I decided to write a single-use maybe monad for a theoretical use
case. I will use the magrittr pipe operator %>%
as a close analogy for
function composition, and will similarly create a new bind operator for my
monad.
The problem
You are developing a game, which can be played as different characters. Depending on which character you choose, you can have different number of lives. All the necessary information is stored in a database, however, your team of developers is not very organised. The database can be updated any time, column names may change, and character names may go missing.
For simplicity, your “database” is a data frame saved in a CSV file.
# Data -------------------------------------------------------------------- library(tidyverse) livesleft <- tibble( names= c("John", "Ed", "Ned", "Sam", "Benjen", "Beric"), nLives=c( 2, 1, 0, 1, 0.5, 4) ) livesleft ## # A tibble: 6 x 2 ## names nLives ## <chr> <dbl> ## 1 John 2 ## 2 Ed 1 ## 3 Ned 0 ## 4 Sam 1 ## 5 Benjen 0.5 ## 6 Beric 4 write_csv(livesleft, "livesleft.csv")
You decide to write a robust pipeline, so you don’t have to deal with your colleagues’ mess. If there is a problem at any stage, the pipeline should return 1 (the default number of lives). You also want some information on what caused the error. Here’s the procedure:
- Read database (may be missing)
- Filter by name (name can be missing)
- Get number of lives from the lives column (the column can be missing)
If nothing goes wrong, you can just use a pipe operator.
read_csv("livesleft.csv") %>% filter(names=="Ed") %>% pull(nLives) ## [1] 1
But let’s see what happens if some of those functions fail!
read_csv("wrongFile.csv") %>% # missing data filter(names=="Beric") %>% pull(nLives) ## Error: 'wrongFile.csv' does not exist in current working directory ('/builds/Kupac/biofunctor/content/post'). read_csv("livesleft.csv") %>% filter(names=="Bran") %>% # missing name pull(nLives) ## numeric(0) read_csv("livesleft.csv") %>% filter(names=="Beric") %>% pull(nKilled) # wrong column name ## Error in eval_tidy(enquo(var), var_env): object 'nKilled' not found
Disaster! No number returned, the game crashes, you have to go back, and fix the database. There must be a way…
Maybe
So you can encounter errors or missing data at any stage of the pipeline. In R, you could deal with these using tryCatch or something similar. But that means you’d have to rewrite each and every function making sure that:
- The inputs are correct
- The errors are caught and reported
You can’t escape the second part, but maybe you can avoid checking inputs every time, and produce simpler code.
Actually, these functions return either nothing or something. This can be represented by a maybe value. To implement this, you’ll need a helper function that wraps any value in a maybe container.
just <- function(x) { res <- list( type = "Just", content = x ) class(res) <- append(class(res), "maybe") return(res) }
In this simple implementation, a Maybe
is a list of length 2. The first slot
is the string "Just"
, and the second is the object to be wrapped.
Errors will be represented similarly, by Nothing
, accompanied by an error
string. It’s also a list of length 2, but the the first slot contains the word
"Nothing"
, and the second the error message.
nothing <- function(errorString) { res <- list( type = "Nothing", content = errorString ) class(res) <- "maybe" return(res) }
For this article, I also create a print method for the maybe
class, so it’s
not printed as a list.
print.maybe <- function(x, ...) { if(x[["type"]] == "Just") { cat("Just:\n") print(x[["content"]], ...) } else { cat("Nothing:", x[["content"]], sep="\n") } }
Here are some examples:
just("a") ## Just: ## [1] "a" just(matrix(1:16,ncol=4)) ## Just: ## [,1] [,2] [,3] [,4] ## [1,] 1 5 9 13 ## [2,] 2 6 10 14 ## [3,] 3 7 11 15 ## [4,] 4 8 12 16 nothing("This is empty.") ## Nothing: ## This is empty.
Safe functions
With these helpers, you can rewrite the three functions in a safe form. The
safe_read_csv
will take a string
(file name), and return a Maybe tibble
.
\[safe\_read\_csv :: character \rightarrow Maybe_{tibble}\]
safe_read_csv <- function(file, ...) { if (file.access(file, 4) == -1) { return(nothing(paste0("safe_read_csv: Couldn't open '", file, "'."))) } else { return(just(read_csv(file, ...))) } } safe_read_csv("livesleft.csv") ## Just: ## # A tibble: 6 x 2 ## names nLives ## <chr> <dbl> ## 1 John 2 ## 2 Ed 1 ## 3 Ned 0 ## 4 Sam 1 ## 5 Benjen 0.5 ## 6 Beric 4 safe_read_csv("livesleft_notExist.csv") ## Nothing: ## safe_read_csv: Couldn't open 'livesleft_notExist.csv'.
The safe_filter
function takes a tibble
and a name
, and also returns a
Maybe tibble
with a single row.
\[safe\_filter :: tibble, name \rightarrow Maybe_{tibble[1,]}\]
safe_filter <- function(.data, name) { n <- sum(grepl(name, .data$names)) # If there's only one line with that name, # return a maybe tibble if(n==1) { return(just(filter(.data, names==name))) } else { # Otherwise return nothing, and explain err <- paste0("safe_filter: The name '", name, "' identifies ", n, " persons.") return(nothing(err)) } }
The safe_pull
function takes a tibble
and a column name
and returns the
Maybe value(s)
from the column. If it’s applied after safe_filter
, it will
be a single value.
\[safe\_filter :: tibble, colname \rightarrow Maybe_{A}\]
safe_pull <- function(.data, varName) { varName <- varName[1] if(exists(varName, where=.data)) { # Return the variable return(just(pull(.data, var=varName))) } else { return(nothing("safe_pull: Requested column is missing from data")) } }
Great! But when you try to combine these functions using the magrittr pipe operator, it fails.
safe_read_csv("livesleft.csv") %>% safe_filter("Ed") %>% safe_pull("nLives") ## Nothing: ## safe_pull: Requested column is missing from data
Of course; the outputs and inputs don’t match! The outputs are Maybe
values, while the functions can’t work on those. We could
re-write each function to unwrap Maybe
-s, and process the content, or we
can create a new pipe operator that does it for them!
Bind
How should this infix operator
look like? Well, first let’s look at how the pipe operator (%>%
) works.
It takes a value of classA
and “passes it on” to a
function that converts classA
to classB
.
At least that’s what it looks like on the surface.
The pipe operator is actually a higher order function. It takes two arguments: a value on the left hand side (LHS) and a function on the right (RHS). Then, it simply applies the function on the value, and returns the result. It can be written as: \[ \textrm{%>%}::LHS=class_A,~RHS=(class_A \rightarrow class_B) \rightarrow class_B\]
So our new operator should be very similar, except that it should take a
Maybe value
, unwrap it, and then apply the function. Also, the RHS function
should be a safe function to keep the computation in the realm of Maybe
-s.
This is particularly important, so that we can chain multiple functions together
with the new bind operator.
\[ \textrm{%>=%}::LHS=Maybe_A,~RHS=(class_A \rightarrow Maybe_B) \rightarrow Maybe_B\]
It should take a Maybe classA
(the input), and a safe function that converts a classA
to a Maybe classB
,
and the output should be a Maybe classB
. The implementation is quite simple,
if we cheat and use the already existing magrittr pipe operator.
`%>=%` <- function(ma, f) { if(!is(ma, "maybe")) stop("Provide a maybe value left of '%>=%' !") if(ma[[1]]=="Nothing") { return(ma) # If Nothing, just pass on the Nothing } else { # If something, then apply the function on the something (ma[[2]]) func <- deparse(substitute(f)) # String from function cmd <- paste0("ma[[2]] %>% ", func) # Create command string res <- eval(parse(text=cmd)) # evaluate command string # Check if the function returns a maybe value if(is(res, "maybe")) { return(res) } else { stop("RHS function must return a Maybe.") } } }
- If the input is a
Nothing
value, then it’s simply returned, and the safe function doesn’t run. - If the input is a
Just
, then the safe function is applied on the value in the 2nd slot. Now the pipe works:
safe_read_csv("livesleft.csv") %>=% safe_filter("Beric") %>=% safe_pull("nLives") ## Just: ## [1] 4
We need one more thing to complete the pipeline: a function to unwrap the
maybe
-s. Since you always want a value returned in the end, you need
a helper function to extract a value from the maybe, and use a default one if
it’s a nothing.
\[ fromMaybe:: Maybe_A, class_A\rightarrow class_A\]
fromMaybe <- function(ma, defaultValue) { if(ma[[1]]=="Just") return(ma[[2]]) else { message("Returning default value, because:\n", ma[[2]]) return(defaultValue) } }
Now your safe functions can’t fail, and the pipe will always return a result.
safe_read_csv("livesleft.csv", col_types=cols( names=col_character(), nLives=col_double() )) %>=% safe_filter("Benjen") %>=% safe_pull("nLives") %>% # Regular pipe, fromMaybe fromMaybe(defaultValue=1) # expects a maybe value! ## [1] 0.5 safe_read_csv("livesleft.csv", col_types=cols( names=col_character(), nLives=col_double() )) %>=% safe_filter("Bronn") %>=% # Wrong name safe_pull("nLives") %>% fromMaybe(defaultValue=1) ## Returning default value, because: ## safe_filter: The name 'Bronn' identifies 0 persons. ## [1] 1
Summary
Of course, it’s an overkill to do all this just for three functions, but the concept is very powerful. Not only can you chain together an unlimited number of functions this way, but it can be extended to different kinds of logic. Instead of catching errors, you can pass down a state, write log messages, create collections, etc.
All you need to do is to implement a few basic functions:
- Wrapper function(s): wrap any basic data type into a monadic value
(here:
just()
,nothing()
) - Functions that return the monadic values (here: “safe functions”)
- Bind (
%>=%
): To facilitate the composition of such functions - Optionally, a function to unwrap the monadic value
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.