Python and R: Is Python really faster than R?
[This article was first published on Analysis with Programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A friend of mine asked me to code the following in R:
Staying with the default values, one would obtain
The output is a list of Matrix and Decision, wherein the first column of the first list (Matrix) is the computed $\bar{x}$; the second and third columns are the lower and upper limits of the confidence interval, respectively; and the fourth column, is an array of ones — if true mean is contained in the interval and zeros — true mean not contained.
Now how fast it would be if I were to code this in Python?
I do have a prior knowledge that Python beats R in terms of speed (confirmed from Nathan’s post), but out of curiosity I wasn’t satisfied with that fact; and leads me to the following Python equivalent,
Computing the elapsed time, we have
and Python,
Gets even worst! 64 seconds over 7 seconds? That’s a huge difference. I don’t know what is happening here, but I did my best to literally translate the R codes to Python, and yet R?
Any thoughts guys, especially to the Python gurus?
- Generate samples of size 10 from Normal distribution with $\mu$ = 3 and $\sigma^2$ = 5;
- Compute the $\bar{x}$ and $\bar{x}\mp z_{\alpha/2}\displaystyle\frac{\sigma}{\sqrt{n}}$ using the 95% confidence level;
- Repeat the process 100 times; then
- Compute the percentage of the confidence intervals containing the true mean.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
case <- function(n = 10, mu = 3, sigma = sqrt(5), p = 0.025, rep = 100){ | |
m <- matrix(NA, nrow = rep, ncol = 4) | |
for(i in 1:rep){ | |
norm <- rnorm(mean = mu, sd = sigma, n = n) | |
xbar <- mean(norm) | |
low <- xbar - qnorm(p = 1 - p) * (sigma/sqrt(n)) | |
up <- xbar + qnorm(p = 1 - p) * (sigma/sqrt(n)) | |
if((mu > low) && (mu < up)){ | |
rem <- 1 | |
} else { | |
rem <- 0 | |
} | |
m[i, ] <- c(xbar, low, up, rem) | |
} | |
inside <- sum(m[, 4]) | |
per <- inside / rep | |
desc <- paste("There are ", inside, " confidence intervals that contain ", | |
"the true mean (", mu, "), that is ", per, " percent of the total CIs", sep = "") | |
return(list(Matrix = m, Decision = desc)) | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$Matrix | |
[,1] [,2] [,3] [,4] | |
[1,] 3.8195089 2.4336051 5.205413 1 | |
[2,] 4.6144773 3.2285735 6.000381 0 | |
[3,] 2.0731334 0.6872296 3.459037 1 | |
[4,] 3.6641014 2.2781976 5.050005 1 | |
[5,] 3.8393909 2.4534871 5.225295 1 | |
: : : : : | |
: : : : : | |
[95,] 1.9025146 0.5166108 3.288418 1 | |
[96,] 2.4069555 1.0210517 3.792859 1 | |
[97,] 2.3141211 0.9282173 3.700025 1 | |
[98,] 3.2102493 1.8243455 4.596153 1 | |
[99,] 2.7873416 1.4014378 4.173245 1 | |
[100,] 3.1422372 1.7563333 4.528141 1 | |
$Decision | |
[1] "There are 95 confidence intervals that contain the true mean (3), that is 0.95 percent of the total CIs" |
Now how fast it would be if I were to code this in Python?
I do have a prior knowledge that Python beats R in terms of speed (confirmed from Nathan’s post), but out of curiosity I wasn’t satisfied with that fact; and leads me to the following Python equivalent,
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import scipy.stats as ss | |
def case(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100): | |
m = np.zeros((rep, 4)) | |
for i in range(rep): | |
norm = np.random.normal(loc = mu, scale = sigma, size = n) | |
xbar = np.mean(norm) | |
low = xbar - ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n)) | |
up = xbar + ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n)) | |
if (mu > low) & (mu < up): | |
rem = 1 | |
else: | |
rem = 0 | |
m[i, :] = [xbar, low, up, rem] | |
inside = np.sum(m[:, 3]) | |
per = inside / rep | |
desc = "There are " + str(inside) + " confidence intervals that contain " \ | |
"the true mean (" + str(mu) + "), that is " + str(per) + " percent of the total CIs" | |
return {"Matrix": m, "Decision": desc} |
- RThis file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
system.time(case()) #Output user system elapsed 0.008 0.000 0.008 - PythonThis file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
import time t0 = time.time() case() time.time() - t0 #Output 0.08859896659851074
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
system.time(case(rep = 100000)) | |
#Output | |
user system elapsed | |
7.076 0.000 7.066 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import time | |
t0 = time.time() | |
case(rep = 100000) | |
time.time() - t0 | |
#Output | |
64.88077402114868 |
Any thoughts guys, especially to the Python gurus?
To leave a comment for the author, please follow the link and comment on their blog: Analysis with Programming.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.