Pseudo-Random vs. Random Numbers in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Earlier, I found an interesting post from Bo Allen on pseudo-random vs random numbers, where the author uses a simple bitmap (heat map) to show that the rand function in PHP has a systematic pattern and compares these to truly random numbers obtained from random.org. The post’s results suggest that pseudo-randomness in PHP is faulty and, in general, should not be underestimated in practice. Of course, the findings should not be too surprising, as there is a large body of literature on the subtleties, philosophies, and implications of the pseudo aspect of the most common approaches to random number generation. However, it is silly that PHP’s random number generator (RNG) displays such an obvious pattern nowadays because there are several decent, well-studied pseudo-RNG algorithms available as well as numerous tests for randomness. For a good introduction to RNG, I recommend John D. Cook’s discussion on testing a random number generator.
Now, I would never use PHP for any (serious) statistical analysis, partly due to my fondness for R, nor do I doubt the practicality of the RNG in R. But I was curious to see what would happen. So, created equivalent plots in R to see if a rand equivalent would exhibit a systematic pattern like in PHP, even if less severe. Also, for comparison, I chose to use the random package, from Dirk Eddelbuettel, to draw truly random numbers from random.org. Until today, I had only heard of the random package but had never used it.
I have provided the function rand_bit_matrix, which requires the number of rows and columns to display in the plotted bitmap. To create the bitmaps, I used the pixmap package rather than the much-loved ggplot2 package, simply because of how easy it was for me to create the plots. (If you are concerned that I have lost the faith, please note that I am aware of the awesomeness of ggplot2 and its ability to create heat maps.)
It is important to note that there were two challenges that I encountered when using drawing truly random numbers.
- Only 10,000 numbers can be drawn at once from random.org. (This is denoted as max_n_random.org in the function below.)
- There is a daily limit to the number of times the random.org service will provide numbers.
To overcome the first challenge, I split the total number of bits into separate calls, if necessary. This approach, however, increases our number of requests, and after too many requests, you will see the error: random.org suggests to wait until tomorrow. Currently, I do not know the exact number of allowed requests or if the amount of requested random numbers is a factor, but looking back, I would guess about 20ish large requests is too much.
Below, I have plotted 500 x 500 bitmaps based on the random bits from both of R and random.org. As far as I can tell, no apparent patterns are visible in either plot, but from the graphics alone, our conclusions are limited to ruling out obvious systematic patterns, which were exhibited from the PHP code. I am unsure if the PHP folks formally tested their RNG algorithms for randomness, but even if they did, the code in both R and PHP is straightforward and provides a quick eyeball test. Armed with similar plots alone, the PHP devs could have sought for better RNG algorithms — perhaps, borrowed those from R.
“` r library(“plyr”) library(“pixmap”) library(“random”)
rand_bit_matrix <- function(num_rows = 500, num_cols = 500, max_n_random.org = 10000, seed = NULL) { # I have copied the following function directly from help(‘integer’). is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) { abs(x - round(x)) < tol }
# The number of bits to draw at 'random'. n <- num_rows * num_cols if (n <= 0 || !is.wholenumber(n)) { stop("The number of bits 'n' should be a natural number.") } if (!is.null(seed)) { set.seed(seed) } # Create a matrix of pseudo-random bits. bits_R <- replicate(n = num_cols, sample(c(0, 1), size = num_rows, replace = TRUE)) # Because random.org will only return a maximum of 10,000 numbers at a # time, we break this up into several calls. seq_n_random.org <- rep.int(x = max_n_random.org, times = n%/%max_n_random.org) if (n%%max_n_random.org > 0) { seq_n_random.org <- c(seq_n_random.org, n%%max_n_random.org) } bits_random.org <- lapply(seq_n_random.org, function(n) { try_default(randomNumbers(n = n, min = 0, max = 1, col = 1), NA) }) bits_random.org <- matrix(unlist(bits_random.org), nrow = num_rows, ncol = num_cols) list(R = bits_R, random.org = bits_random.org) }
bit_mats <- rand_bit_matrix(num_rows = 500, num_cols = 500, seed = 42)
with(bit_mats, plot(pixmapGrey(data = R, nrow = nrow(R), ncol = ncol(R)), main = “R”)) ```
r
with(bit_mats, plot(pixmapGrey(data = random.org, nrow = nrow(random.org),
ncol = ncol(random.org)), main = "random.org"))
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.