Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I had the opportunity recently to be asked to join a committee which had to select candidates for an upcoming election. In the candidate selection phase, the members of the committee use preferential voting to rank the candidates. Prior to the meeting, I did some research on preferential voting systems. I found two main classes of methods used to determine winners-those based on some form of Borda count and those based on some form of Condorcet method.
In previous years, this committee selected the order of the candidates purely on an average rank basis, which is functionally equivalent to a Borda count (although not a modified Borda count). Being informed that I was going to be the tabulator and calculator, I decided that this year I also wanted to use a Condorcet method and compare it with the average rank. I decided that my Condorcet method was going to take the following path:
- Calculate if there is a pure Condorcet method
- If there is a cycle (no Condorcet winner) use the Schulze method
- If there is no unique member of the Schwartz set (the Schulze winner) then select one of a number of tiebreakers
I will gloss over the embarrassing details of how I missed an edge case while checking for stopping conditions and ended up in an infinite while loop, and focus on how I used Rcpp to greatly speed up the process. Condorcet methods require making a pairwise comparison between every candidate. This is slow—\(\mathcal{O}(n^2)\). The Schulze method has a rather simple implementation , but it is even slower—\(\mathcal{O}(n^3)\).
Here is a sample ballot:
## Candidates Vote A Vote B Vote C Vote D Vote E Vote F Vote G Vote H ## 1 Albert 1 2 1 3 2 1 3 4 ## 2 Bruce 2 4 5 4 5 4 1 2 ## 3 Charles 3 1 3 1 1 2 4 5 ## 4 David 4 5 2 5 4 5 2 1 ## 5 Edward 5 3 4 2 3 3 5 3
Here is some simple code to calculate the average rank of the candidates:
AvgRank <- function(BallotMatrix) { Ballots <- as.matrix(BallotMatrix[, -1], mode = "numeric") Num_Candidates <- dim(Ballots)[1] Names <- BallotMatrix[, 1] Ballots[is.na(Ballots)] <- Num_Candidates + 1 #Treat blanks as one worse than min MeanRanks <- rowMeans(Ballots) Rankings <- data.frame(Names, MeanRanks) Rankings <- Rankings[order(rank(Rankings[, 2], ties.method = "random")), ] #Ties handled through random draw Rankings <- data.frame(Rankings, seq_along(Rankings[, 1])) names(Rankings) <- c("Names", "Average Rank", "Position") return(Rankings) }
The above ballot would result in the following ranking:
## Names Average Rank Position ## 1 Albert 2.125 1 ## 3 Charles 2.500 2 ## 2 Bruce 3.375 3 ## 5 Edward 3.500 4 ## 4 David 3.500 5
Here is some (simplified) code to calculate Condorcet and Schulze winners:
VoteExtract <- function(BallotMatrix) { Votes <- as.matrix(BallotMatrix[, -1], mode = "numeric") Num_Candidates <- dim(Votes)[1] Votes[is.na(Votes)] <- Num_Candidates + 1 #Treat blanks as one worse than min return(Votes) } PairCount <- function(Votes) { Num_Candidates <- dim(Votes)[1] Pairwise <- matrix(nrow = Num_Candidates, ncol = Num_Candidates) for (CurCand in 1:Num_Candidates) { CandRank <- as.vector(as.matrix(Votes[CurCand, ])) Pref_Cur_Cand <- t(Votes) - CandRank for (Pairs in 1:Num_Candidates) { Pairwise[CurCand, Pairs] <- sum(Pref_Cur_Cand[, Pairs] > 0) } } return(Pairwise) } Schulze <- function(PairsMatrix) { size <- dim(PairsMatrix)[1] p <- matrix(nrow = size, ncol = size) for (i in 1:size) { for (j in 1:size) { if (i != j) { if (PairsMatrix[i, j] > PairsMatrix[j, i]) { p[i, j] <- PairsMatrix[i, j] } else { p[i, j] <- 0 } } } } for (i in 1:size) { for (j in 1:size) { if (i != j) { for (k in 1:size) { if (i != k && j != k) { p[j, k] <- max(p[j, k], min(p[j, i], p[i, k])) } } } } } diag(p) <- 0 return(p) } CondorcetRank <- function(BallotMatrix) { Num_Candidates <- dim(BallotMatrix)[1] Rankings <- matrix(nrow = Num_Candidates, ncol = 3) CurrentBallot <- BallotMatrix CurrentRank <- 1 while (CurrentRank <= Num_Candidates) { CurrentNames <- as.vector(CurrentBallot[, 1]) CurrentSize <- length(CurrentNames) CurrentVotes <- VoteExtract(CurrentBallot) Pairwise <- matrix(nrow = CurrentSize, ncol = CurrentSize) Pairwise <- PairCount(CurrentVotes) Winner <- vector(length = CurrentSize) # Check for Condorcet Winner for (i in 1:CurrentSize) { Winner[i] <- sum(Pairwise[i, ] > Pairwise[, i]) == (CurrentSize - 1) } if (sum(Winner == TRUE) == 1) { # Condorcet Winner Exists CurrentWinner <- which(Winner == TRUE) Rankings[CurrentRank, ] <- c(CurrentNames[CurrentWinner], CurrentRank, "Condorcet") } else { # Condorcet Winner does not exist, calculate Schulze beatpaths Pairwise <- Schulze(Pairwise) for (i in 1:CurrentSize) { Winner[i] <- sum(Pairwise[i, ] > Pairwise[, i]) == (CurrentSize - 1) } if (sum(Winner == TRUE) == 1) { # Schwartz set has unique member CurrentWinner <- which(Winner == TRUE) Rankings[CurrentRank, ] <- c(CurrentNames[CurrentWinner], CurrentRank, "Schulze") } } CurrentBallot <- CurrentBallot[-CurrentWinner, ] CurrentRank = CurrentRank + 1 } Rankings <- data.frame(Rankings) names(Rankings) <- c("Name", "Rank", "Method") return(Rankings) }
The pairwise matrix would be:
## [,1] [,2] [,3] [,4] [,5] ## [1,] 0 6 5 6 6 ## [2,] 2 0 3 5 3 ## [3,] 3 5 0 5 7 ## [4,] 2 3 3 0 4 ## [5,] 2 5 1 4 0
The beatpath matrix for rank 1 (all ballots) would be:
## [,1] [,2] [,3] [,4] [,5] ## [1,] 0 6 5 6 6 ## [2,] 0 0 0 5 0 ## [3,] 0 5 0 5 7 ## [4,] 0 0 0 0 0 ## [5,] 0 5 0 5 0
and just for completeness, the full Condorcet ranking (selecting a winner, removing that entry from the ballot, and repeating on the smaller set) would be:
## Name Rank Method ## 1 Albert 1 Condorcet ## 2 Charles 2 Condorcet ## 3 Edward 3 Schulze ## 4 Bruce 4 Condorcet ## 5 David 5 Condorcet
When profiling this code, 81% of the time was spent in the Schulze algorithm, another 12% was spent in the PairCount algorithm, and the remaining 7% was spent on everything else (the actual ranking had multiple steps to handle cases when there was no Schulze winner). To speed up the procedure, I ported the Schulze and PairCount functions to C++ using Rcpp. Below is the C++ code:
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] IntegerMatrix Schulze_C(IntegerMatrix Pairs) { int nrow = Pairs.nrow(); IntegerMatrix Schulze(nrow, nrow); for (int i = 0; i < nrow; i++) { for (int j = 0; j < nrow; j++) { if (i != j) { if (Pairs(i, j) > Pairs(j, i)) { Schulze(i, j) = Pairs(i, j); } else { Schulze(i, j) = 0; } } } } for (int i = 0; i < nrow; i++) { for (int j = 0; j < nrow; j++) { if (i != j) { for (int k = 0; k < nrow; k++) { if ((i != k) && (j != k)) { Schulze(j, k) = (std::max)(Schulze(j, k), (std::min)(Schulze(j, i), Schulze(i, k))); } } } else { if ((i = j)) { Schulze(i, j) = 0; } } } } return(Schulze); } // [[Rcpp::export]] IntegerMatrix PairCount_C(IntegerMatrix Votes) { int Num_Candidates = Votes.nrow(); int Num_Ballots = Votes.ncol(); IntegerMatrix Pairwise(Num_Candidates, Num_Candidates); for (int CurCand = 0; CurCand < Num_Candidates; CurCand++) { IntegerVector CandRank = Votes(CurCand, _); IntegerMatrix Pref_Cur_Cand(Num_Candidates, Num_Ballots); for (int i = 0; i < Num_Candidates; i++) { for (int j = 0; j < Num_Ballots; j++) { Pref_Cur_Cand(i, j) = Votes(i, j) - CandRank(j); } } for (int i = 0; i < Num_Candidates; i++) { int G0 = 0; for (int j = 0; j < Num_Ballots; j++) { if (Pref_Cur_Cand(i, j) > 0) G0 += 1; } Pairwise(CurCand, i) = G0; } } return(Pairwise); }
First, we need to check that the functions return the same values:
all.equal(PairCount(VoteExtract(Ballot)), PairCount_C(VoteExtract(Ballot))) ## [1] TRUE all.equal(Schulze(PairCount(VoteExtract(Ballot))), Schulze_C(PairCount_C(VoteExtract(Ballot)))) ## [1] TRUE
Now let's compare the functions' speed:
library(microbenchmark) V <- VoteExtract(Ballot) P <- PairCount(V) microbenchmark(PairCount(V), PairCount_C(V), Schulze(P), Schulze_C(P), times = 100L)
## Unit: microseconds ## expr min lq median uq max neval ## PairCount(V) 184.360 189.126 191.657 195.678 267.754 100 ## PairCount_C(V) 5.064 5.958 6.851 7.447 22.637 100 ## Schulze(P) 363.955 369.912 376.762 386.292 1006.682 100 ## Schulze_C(P) 1.490 2.085 2.979 3.277 7.446 100
The PairCount function is over 25 times faster in C++ and the Schulze function is over 120 times as fast! Moreover, the PairCount algorithm reads more logically in C++, as the way R handles vectors and matrices, when subtracting the current rank, the resulting matrix ended up transposed with the candidates across the columns.
So with easy-to-read code that results in speed gains of multiple orders of magnitude, what's not to like?!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.