Python and R: Is Python really faster than R?

[This article was first published on Analysis with Programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A friend of mine asked me to code the following in R:
  1. Generate samples of size 10 from Normal distribution with $\mu$ = 3 and $\sigma^2$ = 5;
  2. Compute the $\bar{x}$ and $\bar{x}\mp z_{\alpha/2}\displaystyle\frac{\sigma}{\sqrt{n}}$ using the 95% confidence level;
  3. Repeat the process 100 times; then
  4. Compute the percentage of the confidence intervals containing the true mean.
 So here is what I got,

case <- function(n = 10, mu = 3, sigma = sqrt(5), p = 0.025, rep = 100){
m <- matrix(NA, nrow = rep, ncol = 4)
for(i in 1:rep){
norm <- rnorm(mean = mu, sd = sigma, n = n)
xbar <- mean(norm)
low <- xbar - qnorm(p = 1 - p) * (sigma/sqrt(n))
up <- xbar + qnorm(p = 1 - p) * (sigma/sqrt(n))
if((mu > low) && (mu < up)){
rem <- 1
} else {
rem <- 0
}
m[i, ] <- c(xbar, low, up, rem)
}
inside <- sum(m[, 4])
per <- inside / rep
desc <- paste("There are ", inside, " confidence intervals that contain ",
"the true mean (", mu, "), that is ", per, " percent of the total CIs", sep = "")
return(list(Matrix = m, Decision = desc))
}
view raw pyvsr1.R hosted with ❤ by GitHub
Staying with the default values, one would obtain

$Matrix
[,1] [,2] [,3] [,4]
[1,] 3.8195089 2.4336051 5.205413 1
[2,] 4.6144773 3.2285735 6.000381 0
[3,] 2.0731334 0.6872296 3.459037 1
[4,] 3.6641014 2.2781976 5.050005 1
[5,] 3.8393909 2.4534871 5.225295 1
: : : : :
: : : : :
[95,] 1.9025146 0.5166108 3.288418 1
[96,] 2.4069555 1.0210517 3.792859 1
[97,] 2.3141211 0.9282173 3.700025 1
[98,] 3.2102493 1.8243455 4.596153 1
[99,] 2.7873416 1.4014378 4.173245 1
[100,] 3.1422372 1.7563333 4.528141 1
$Decision
[1] "There are 95 confidence intervals that contain the true mean (3), that is 0.95 percent of the total CIs"
view raw pyvsr2.R hosted with ❤ by GitHub
The output is a list of Matrix and Decision, wherein the first column of the first list (Matrix) is the computed $\bar{x}$; the second and third columns are the lower and upper limits of the confidence interval, respectively; and the fourth column, is an array of ones — if true mean is contained in the interval and zeros — true mean not contained.

Now how fast it would be if I were to code this in Python?

I do have a prior knowledge that Python beats R in terms of speed (confirmed from Nathan’s post), but out of curiosity I wasn’t satisfied with that fact; and leads me to the following Python equivalent,

import numpy as np
import scipy.stats as ss
def case(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100):
m = np.zeros((rep, 4))
for i in range(rep):
norm = np.random.normal(loc = mu, scale = sigma, size = n)
xbar = np.mean(norm)
low = xbar - ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))
up = xbar + ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))
if (mu > low) & (mu < up):
rem = 1
else:
rem = 0
m[i, :] = [xbar, low, up, rem]
inside = np.sum(m[:, 3])
per = inside / rep
desc = "There are " + str(inside) + " confidence intervals that contain " \
"the true mean (" + str(mu) + "), that is " + str(per) + " percent of the total CIs"
return {"Matrix": m, "Decision": desc}
view raw pyvsr3.py hosted with ❤ by GitHub
Computing the elapsed time, we have
  • R
    system.time(case())
    #Output
    user system elapsed
    0.008 0.000 0.008
    view raw pyvsr4.R hosted with ❤ by GitHub
  • Python
    import time
    t0 = time.time()
    case()
    time.time() - t0
    #Output
    0.08859896659851074
    view raw pyvsr5.py hosted with ❤ by GitHub
As you can see, R executes at 0.008 seconds while Python runs at 0.089 seconds. I am surprised by this fact! I mean, what is happening with my Python? Firing up to 100000 repetitions,

system.time(case(rep = 100000))
#Output
user system elapsed
7.076 0.000 7.066
view raw pyvsr6.R hosted with ❤ by GitHub
and Python,

import time
t0 = time.time()
case(rep = 100000)
time.time() - t0
#Output
64.88077402114868
view raw pyvsr7.py hosted with ❤ by GitHub
Gets even worst! 64 seconds over 7 seconds? That’s a huge difference. I don’t know what is happening here, but I did my best to literally translate the R codes to Python, and yet R?

Any thoughts guys, especially to the Python gurus?

To leave a comment for the author, please follow the link and comment on their blog: Analysis with Programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)