Secret Santa is a graph traversal problem

Posted on November 28, 2017 by Higher Order Functions in R bloggers | 0 Comments

[This article was first published on Higher Order Functions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week at Thanksgiving, my family drew names from a hat for our annual game of Secret Santa. Actually, it wasn’t a hat but you know what I mean. (Now that I think about it, I don’t think I’ve ever seen names drawn from a literal hat before!) In our family, the rules of Secret Santa are pretty simple:

The players’ names are put in “a hat”.
Players randomly draw a name from a hat, become that person’s Secret Santa, and get them a gift.
If a player draws their own name, they draw again.

Once again this year, somebody asked if we could just use an app or a website to handle the drawing for Secret Santa. Or I could write a script to do it I thought to myself. The problem nagged at the back of my mind for the past few days. You could just shuffle the names… no, no, no. It’s trickier than that.

In this post, I describe a couple of algorithms for Secret Santa sampling using R and directed graphs. I use the DiagrammeR package which creates graphs from dataframes of nodes and edges, and I liberally use dplyr verbs to manipulate tables of edges.

If you would like a more practical way to use R for Secret Santa, including automating the process of drawing names and emailing players, see this blog post.

Making a graph, connecting nodes twice

Let’s start with a subset of my family’s game of just five players. I assign each name a unique ID number.

library(DiagrammeR)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)

players <- tibble::tribble(
  ~ Name, ~ Number,
  "Jeremy", 1,
  "TJ", 2, 
  "Jonathan", 3, 
  "Alex", 4, 
  "Marissa", 5)

We can think of the players as nodes in a directed graph. An edge connecting two players indicates a “gives-to” (Secret Santa) relationship. Suppose I drew Marissa’s name. Then the graph will have an edge connecting me to her. In the code below, I use DiagrammeR to create a graph by combining a node dataframe (create_node_df()) and an edge dataframe (create_edge_df()).

nodes <- create_node_df(
  n = nrow(players),
  type = players$Name,
  label = players$Name)

tj_drew_marissa <- create_edge_df(
  from = 2, 
  to = 5, 
  rel = "gives-to",
  color = "#FF4136",
  penwidth = 1)

create_graph(nodes, tj_drew_marissa) %>% 
  render_graph()

Before the game starts, anyone could draw anyone else’s name, so let’s visualize all possible gives-to relations. We can do this by using combn(n, 2) to generate all n-choose-2 pairs and creating two edges for each pair.

combn(players$Name, 2)
#>      [,1]     [,2]       [,3]     [,4]      [,5]       [,6]   [,7]     
#> [1,] "Jeremy" "Jeremy"   "Jeremy" "Jeremy"  "TJ"       "TJ"   "TJ"     
#> [2,] "TJ"     "Jonathan" "Alex"   "Marissa" "Jonathan" "Alex" "Marissa"
#>      [,8]       [,9]       [,10]    
#> [1,] "Jonathan" "Jonathan" "Alex"   
#> [2,] "Alex"     "Marissa"  "Marissa"

# All the edge-manipulating functions in this post take an optional take `...`
# argument for setting the style of edges.
create_all_giving_edges <- function(xs, ...) {
  aes_options <- quos(...)
  pairs <- combn(seq_along(xs), 2)
  # Each column from `combn()` is a pair. We make an edge moving down the column
  # and another edge up the column by having each row as a `from` index and a
  # `to` index.
  from <- c(pairs[1, ], pairs[2, ])
  to <- c(pairs[2, ], pairs[1, ])

  create_edge_df(from = from, to = to) %>% 
    mutate(!!! aes_options) %>% 
    as_tibble()
}

all_possible_edges <- create_all_giving_edges(
  players$Name, 
  rel = "potential-gift", 
  penwidth = .5,
  color = "#CCCCCC90")

create_graph(nodes, all_possible_edges) %>% 
  render_graph()

A fast, simple solution is a Hamiltonian path

Do you need to organize a gift-giving drawing for a group of people? The easiest solution is to shuffle the names and have the first name give to the second name, the second to the third, and so on with last name giving looping back around to the first name. This solution is equivalent to walking through the graph and visiting every node just once. Such a path is called a Hamiltonian path.

Here we find a Hamiltonian path and create a helper function overwrite_edges() to update the edges that fall on the path.

overwrite_edges <- function(old_df, new_df) {
  old_df %>%
    anti_join(new_df, by = c("from", "to")) %>%
    bind_rows(new_df)
}

create_hamiltonian_gift_edges <- function(xs, ...) {
  loop_from <- sample(seq_along(xs))
  # last name gives to first
  loop_to <- c(loop_from[-1], loop_from[1])
  create_edge_df(from = loop_from, to = loop_to, ...)
}

# For reproducible blogging
set.seed(11282017)

hamiltonian_edges <- create_hamiltonian_gift_edges(
  players$Name,
  rel = "gives-to",
  color = "#FF4136",
  penwidth = 1
)

all_possible_edges %>% 
  overwrite_edges(hamiltonian_edges) %>% 
  create_graph(nodes, .) %>% 
  render_graph()

As promised, the red paths loop through all nodes exactly once. No one is their own gift giver :white_check_mark:, and everyone has an incoming red path :white_check_mark: and an outgoing red path :white_check_mark:. Very nice. Actually… let’s put that checklist into a validation function.

# Check for valid gift-giving edges
has_valid_gift_edges <- function(edge_df, indices) {
  indices <- sort(unique(indices))
  pairs <- edge_df %>% filter(rel == "gives-to")
  no_self_loop <- !any(pairs$from == pairs$to)
  exhaustive_from <- isTRUE(all.equal(sort(pairs$from), indices))
  exhaustive_to <- isTRUE(all.equal(sort(pairs$to), indices))
  all(no_self_loop, exhaustive_from, exhaustive_to)
}

has_valid_gift_edges(hamiltonian_edges, all_possible_edges$from)
#> [1] TRUE

Despite its elegance, this solution does not simulate drawing names from a hat! Because each node is visited only once, there is no backtracking, so there there is no reciprocal gift-giving or other sub-circuits in the graph.

Whether you think this is a bad thing is a matter of preference. Personally, I would like all remaining pairs to be equally probable at each step of the drawing. This is not the case when backtracking is not allowed. (For example, if I draw Marissa, then all of the remaining edges are not equally likely because P(Marissa draws TJ | TJ draws Marissa) = 0.)

Okay, do the hat-drawing thing already

Let’s think about what happens when I draw Marissa’s name from a nice big red Santa hat.

The edge from TJ to Marissa is fixed. (I drew her name.)
All other edges from TJ become illegal. (I can’t draw any more names.)
All other edges onto Marissa become illegal. (No one else can draw her name either.)

To simulate a single hat-drawing, we randomly select a legal edge, fix it, and delete all illegal edges. Let’s work through a couple of examples.

First, we need some helper functions.

draw_secret_santa_edge <- function(edge_df, ...) {
  aes_options <- quos(...)
  
  edge_df %>%
    filter(rel != "gives-to") %>%
    sample_n(1) %>%
    mutate(!!! aes_options)
}

find_illegal_edges <- function(edge_df, edge, ...) {
  aes_options <- quos(...)
  
  outgoing <- edge_df %>%
    filter(from %in% edge$from)

  incoming <- edge_df %>%
    filter(to %in% edge$to)

  # The one edge that is not illegal is in both 
  # the incoming and outgoing sets
  to_keep <- dplyr::intersect(outgoing, incoming)

  outgoing %>% 
    bind_rows(incoming) %>% 
    anti_join(to_keep, by = c("from", "to")) %>% 
    mutate(!!! aes_options)
}

Here we draw a single edge (red with fat arrow). All of the other edges that point to the same node are illegal (navy) as are all of the edges that have the same origin as the drawn edge.

current_pick <- draw_secret_santa_edge(
  all_possible_edges,
  rel = "gives-to", 
  color = "#FF4136", 
  penwidth = 1, 
  arrowsize = 1)

current_illegal_edges <- all_possible_edges %>%
  find_illegal_edges(current_pick, color = "#001f3f", penwidth = .5)

all_possible_edges %>%
  overwrite_edges(current_pick) %>% 
  overwrite_edges(current_illegal_edges) %>% 
  create_graph(nodes, .) %>% 
  render_graph(title = "Selected vs. illegal")

We delete those illegal edges and leaving us with the following graph.

edges_after_pick1 <- all_possible_edges %>%
  overwrite_edges(current_pick %>% mutate(arrowsize = NULL)) %>% 
  anti_join(current_illegal_edges, by = "id") 

create_graph(nodes, edges_after_pick1) %>% 
  render_graph(title = "After one draw")

The name has been removed from the hat, and the graph is simpler now!

Let’s do it again. Draw a random legal edge (fat arrow) and identify all the illegal paths (navy).

current_pick <- edges_after_pick1 %>% 
  draw_secret_santa_edge(
    rel = "gives-to", 
    color = "#FF4136", 
    penwidth = 1,
    arrowsize = 1)

current_illegal_edges <- edges_after_pick1 %>%
  find_illegal_edges(edge = current_pick, color = "#001f3f", penwidth = .5)

edges_after_pick1 %>%
  overwrite_edges(current_pick) %>% 
  overwrite_edges(current_illegal_edges) %>% 
  create_graph(nodes, .) %>% 
  render_graph(title = "Selected vs. illegal")

After deleting illegal edges, the problem simplifies further.

edges_after_pick2 <- edges_after_pick1 %>%
  overwrite_edges(current_pick %>% mutate(arrowsize = NULL)) %>% 
  anti_join(current_illegal_edges, by = "id") 

create_graph(nodes, edges_after_pick2) %>% 
  render_graph(title = "After two draws")

You can tell where this is going… Loop Town!

To finish up, we are going to repeat this process until there are only gift-giving edges left. We will control the loop with this helper function which tells us if there are any free edges remaining.

has_free_edge <- function(edge_df) {
  edges_left <- edge_df %>% filter(rel != "gives-to") %>% nrow()
  edges_left != 0
}

In the function below, the while-loop does the same steps as above: Randomly selecting a free edge and removing illegal edges.

draw_edges_from_hat <- function(edge_df, ...) {
  aes_options <- quos(...)
  raw_edge_df <- edge_df
  indices <- unique(c(raw_edge_df$from, raw_edge_df$to))
  
  while (has_free_edge(edge_df)) {
    pick <- edge_df %>% 
      draw_secret_santa_edge(!!! aes_options) %>% 
      mutate(rel = "gives-to")
    
    illegal_edges <- edge_df %>%
      find_illegal_edges(pick)

    edge_df <- edge_df %>%
      overwrite_edges(pick) %>% 
      anti_join(illegal_edges, by = "id") 
  }

  if (!has_valid_gift_edges(edge_df, indices)) {
    warning(call. = FALSE, "Invalid drawing. Trying again.")
    edge_df <- Recall(raw_edge_df, !!! aes_options)
  }
  
  edge_df
}

After the while-loop, the function checks if it has a valid set of gift edges, and if it doesn’t, the function calls itself again. This bit is intended to handle more constrained situations. Such as…

The nibling gift exchange

I lied above. Secret Santa is not so simple in family. For my generation (with me, my siblings and our partners), there’s a rule that a player can’t draw their partner’s name. Similarly, my nieces and nephews (and now also my child :sparkling_heart:) have their own gift exchange with an added constraint: A player can’t give their sibling a gift. The elegant and simple Hamiltonian solution fails under these constraints unless you write a special shuffling algorithm. Our hat-drawing approach, however, handles this situation with minimal effort. Let’s work through an example with my nieces and nephews (and :baby:!). To protect the very young, I have replaced their names with Pokemon names.

Below we define the children and their families and do some data-wrangling so that we have columns with the family at the start and end of each node.

niblings <- tibble::tribble(
  ~ Family, ~ Name, ~ Number,
  "Water", "Squirtle", 1,
  "Water", "Wartortle", 2,
  "Electric", "Pikachu", 3, 
  "Plant", "Bulbasaur", 4, 
  "Plant", "Ivysaur", 5, 
  "Plant", "Venusaur", 6, 
  "Fighting", "Machamp", 7, 
  "Fighting", "Machoke", 8, 
  "Normal", "Ratata", 9,
  "Normal", "Raticate", 10,
  "Psychic", "Mew", 11,
  "Psychic", "Mewtwo", 12)

nibling_edges <- create_all_giving_edges(
  niblings$Name,
  rel = "potential-gift", 
  penwidth = .5,
  color = "#CCCCCC90") %>% 
  left_join(niblings, by = c("from" = "Number")) %>% 
  rename(from_fam = Family) %>% 
  select(-Name) %>% 
  left_join(niblings, by = c("to" = "Number")) %>% 
  rename(to_fam = Family) %>% 
  select(-Name) %>% 
  select(id, from, to, rel, from_fam, to_fam, everything())
nibling_edges
#> # A tibble: 132 x 8
#>       id  from    to            rel from_fam   to_fam penwidth     color
#>    <int> <dbl> <dbl>          <chr>    <chr>    <chr>    <dbl>     <chr>
#>  1     1     1     2 potential-gift    Water    Water      0.5 #CCCCCC90
#>  2     2     1     3 potential-gift    Water Electric      0.5 #CCCCCC90
#>  3     3     1     4 potential-gift    Water    Plant      0.5 #CCCCCC90
#>  4     4     1     5 potential-gift    Water    Plant      0.5 #CCCCCC90
#>  5     5     1     6 potential-gift    Water    Plant      0.5 #CCCCCC90
#>  6     6     1     7 potential-gift    Water Fighting      0.5 #CCCCCC90
#>  7     7     1     8 potential-gift    Water Fighting      0.5 #CCCCCC90
#>  8     8     1     9 potential-gift    Water   Normal      0.5 #CCCCCC90
#>  9     9     1    10 potential-gift    Water   Normal      0.5 #CCCCCC90
#> 10    10     1    11 potential-gift    Water  Psychic      0.5 #CCCCCC90
#> # ... with 122 more rows

In the graph below, we draw an olive edge to connect edge pair of siblings. These edges are illegal before we even start drawing names.

sibling_edges <- nibling_edges %>% 
  filter(from_fam == to_fam) %>% 
  mutate(
    rel = "sibling",
    color = "#3D9970", 
    penwidth = 1)

# Update edges that represent siblings
nibling_edges <- nibling_edges %>% 
  overwrite_edges(sibling_edges)
  
nibling_nodes <- create_node_df(
  n = nrow(niblings),
  type = niblings$Name,
  label = niblings$Name)

nibling_edges %>% 
  overwrite_edges(sibling_edges) %>% 
  create_graph(nibling_nodes, .) %>% 
  render_graph(height = 400)

The solution for this trickier, more constrained version of the problem is to delete the illegal edges beforehand so that they can never be drawn from the hat. After that, the same algorithm works as before.

nibling_edges %>% 
  filter(rel != "sibling") %>% 
  draw_edges_from_hat(color = "#FF4136") %>% 
  create_graph(nibling_nodes, .) %>% 
  render_graph(height = 500)
#> Warning: Invalid drawing. Trying again.

#> Warning: Invalid drawing. Trying again.

Like usual, I’m not sure how to close one of these blog posts. I guess: When a problem involves relations among individuals, always consider whether there is a graph hiding in the background. Even the simple process of drawing names from a hat for Secret Santa describes a graph: a graph of gift-giving relations among individuals. This graph wasn’t obvious to me until I thought about Hamilitonian path solution. Hey, wait a minute, that’s a big gift-giving circle! It’s like some kind of network… ooooooh.

To leave a comment for the author, please follow the link and comment on their blog: Higher Order Functions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Secret Santa is a graph traversal problem

Making a graph, connecting nodes twice

A fast, simple solution is a Hamiltonian path

Okay, do the hat-drawing thing already

The nibling gift exchange

Related

Making a graph, connecting nodes twice

A fast, simple solution is a Hamiltonian path

Okay, do the hat-drawing thing already

The nibling gift exchange

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)