[This article was first published on OpenCPU, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
library(brotli) library(ggplot2) # Example data myfile <- file.path(R.home(), "COPYING") x <- readBin(myfile, raw(), file.info(myfile)$size) # The usual suspects y1 <- memCompress(x, "gzip") y2 <- memCompress(x, "bzip2") y3 <- memCompress(x, "xz") y4 <- brotli_compress(x)
stopifnot(identical(x, memDecompress(y1, "gzip"))) stopifnot(identical(x, memDecompress(y2, "bzip2"))) stopifnot(identical(x, memDecompress(y3, "xz"))) stopifnot(identical(x, brotli_decompress(y4)))
Compression ratio
If we compare compression ratios, we can see Brotli significantly outperformes the competition for this example.# Combine data alldata <- data.frame ( algo = c("gzip", "bzip2", "xz (lzma2)", "brotli"), ratio = c(length(y1), length(y2), length(y3), length(y4)) / length(x) ) ggplot(alldata, aes(x = algo, fill = algo, y = ratio)) + geom_bar(color = "white", stat = "identity") + xlab("") + ylab("Compressed ratio (less is better)")
Decompression speed
Perhaps the most important performance dimension for internet formats is decompression speed. Clients should be able to decompress quickly, even with limited resources such as on browsers and mobile devices.library(microbenchmark) bm <- microbenchmark( memDecompress(y1, "gzip"), memDecompress(y2, "bzip2"), memDecompress(y3, "xz"), brotli_decompress(y4), times = 1000 ) alldata$decompression <- summary(bm)$median ggplot(alldata, aes(x = algo, fill = algo, y = decompression)) + geom_bar(color = "white", stat = "identity") + xlab("") + ylab("Decompression time (less is better)")
Compression speed
So far Brotli showed the best compression ratio, with decompression performance comparable to gzip. But there is no such thing as a free pastry in Switzerland. Here is the caveat: compressing data with brotli is complex and slow:library(microbenchmark) bm <- microbenchmark( memCompress(x, "gzip"), memCompress(x, "bzip2"), memCompress(x, "xz"), brotli_compress(x), times = 20 ) alldata$compression <- summary(bm)$median ggplot(alldata, aes(x = algo, fill = algo, y = compression)) + geom_bar(color = "white", stat = "identity") + xlab("") + ylab("Compression time (less is better)")
To leave a comment for the author, please follow the link and comment on their blog: OpenCPU.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.