Exploring The Quick, Draw! Dataset With R: The Mona Lisa
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
All that noise, and all that sound, all those places I have found (Speed of Sound, Coldplay)
Some days ago, my friend Jorge showed me one of the coolest datasets I’ve ever seen: the Google quick draw dataset. In its Github website you can see a detailed description of the data. Briefly, it contains around 50 million of drawings of people around the world in .ndjson
format. In this experiment, I used the simplified version of drawings where strokes are simplified and resampled with a 1 pixel spacing. Drawings are also aligned to top-left corner and scaled to have a maximum value of 255. All these things make data easier to manage and to represent into a plot.
Since .ndjson
files may be very large, I used LaF
package to access randon lines of the file rather than reading it completely. I wrote a script to explore The Mona Lisa.ndjson
file, which contains more than 120.000 drawings that the TensorFlow engine from Google recognized as being The Mona Lisa. It is quite funny to see them. Whit this script you can:
- Reproduce a random single drawing
- Create a 9×9 mosaic of random drawings
- Create an animation simulating the way the drawing was created
I use ggplot2 package to render drawings and gganimate
package of David Robinson to create animations.
This is an example of a single drawing:
This is an example of a 3×3 mosaic:
This is an example of animation:
If you want to try by yourself, you can find the code here.
Note: to work with gganimate
, I downloaded the portable version and pointed to it with Sys.setenv
command as explained here.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.