[This article was first published on novyden, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Two years ago we had a rare family outing to the Dallas Museum of Art (my son is teenager and he’s into sport after all). It had an excellent exhibition of modern art and DMA allowed taking pictures. Two hours and dozen of pictures later my weekend was over but thanks to Google Photos I just stumbled upon those pictures again. Suddenly, I realized that two paintings I captured make up an illustration of one of the most important concepts in big data.
There are multiple papers, tutorials and web pages about MapReduce and to truly understand and use it one should study at least a few thoroughly. And there are many illustrations of MapReduce structure and architecture out there.
But the power of art can express more with less with just two paintings. First, we have work by Erró Foodscape, 1964:
It illustrates variety, richness, potential of insight if consumed properly, and of course, scale. The painting is boundless emphasizing no end to the table surface in all 4 directions. If we zoom in (you can find better quality image here) it contains many types of food and drinks, packaging, presentations, varying in colors, texture and origin. All these represent big data so much better than any kind of flowchart diagram.
The 2d and final painting is by Wayne Thiebaud Salads, Sandwiches, and Desserts, 1962:
Should we think of how MapReduce works this seemingly infinite table (also fittingly resembling conveyor line) looks like a result of split-apply-combine executed on Foodscape items. Indeed, each vertical group is combination of the same type of finished and plated food ready to serve and combined into variably sized sets (find better quality image here).
And again, I’d like to remind of importance of taking your kids to museum.
My new release of partools is now on CRAN. The package is aimed at doing parallel data science in what I call an “un-MapReduce” manner. It takes the point of view that MapReduce-based frameworks such as Hadoop and Spark are fine for the types of applications their designers had in…
by Yanchang Zhao, RDataMining.com The lectures in week 3 of a free online course Introduction to Data Science give an excellent introduction to MapReduce and Hadoop, and demonstrate with examples how to use MapReduce to do various tasks, such as, … Continue reading →
R can be connected with Hadoop through the rmr2 package. The core of this package is mapreduce() function that allows to write some custom MapReduce algorithms. The aim of this article is to show how it works and to provide … Continue reading →
September 2, 2013
In "R bloggers"
To leave a comment for the author, please follow the link and comment on their blog: novyden.