Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Well I walk upon the river like it’s easier than land
(Love is All, The Tallest Man on Earth)
The Metropolitan Museum of Art provides here a dataset with information on more than 450.000 artworks in its collection. You can do anything you want with these data: there are no restrictions of use. Each record contains information about the author, title, type of work, dimensions, date, culture and geography of a particular piece.
I can imagine a bunch of things to do with these data but since I am a big fan of highcharter, I have done a treemap, which is an artistic (as well as efficient) way to visualize hierarchical data. A treemap is useful to visualize frequencies. They can handle levels, allowing to navigate to go into detail about any category. Here you can find a good example of treemap.
To read data I use fread
function from data.table
package. I also use this package to do some data wrangling operations on the data set. After them, I filter it looking for the word SPANISH
in the columns Artist Nationality
and Culture
and looking for the word SPAIN
in the column Country
. For me, any piece created by an Spanish artist (like this one), coming from Spanish culture (like this one) or from Spain (like this one) is Spanish (this is my very own definition and may do not match with any academical one). Once it is done, it is easy to extract some interesting figures:
- There are 5.294 Spanish pieces in The Met, which means a 1,16% of the collection
- This percentage varies significantly between departments: it raises to 9,01% in The Cloisters and to 4,83% in The Robert Lehman Collection; on the other hand, it falls to 0.52% in The Libraries and to 0,24% in Photographs.
- The Met is home to 1.895 highlights and 44 of them (2,32%) are Spanish; It means that Spanish art is twice as important as could be expected (remember that represents a 1,16% of the entire collection)
My treemap represents the distribution of Spanish artworks by department (column Department
) and type of work (column Classification
). There are two important things to know before doing a treemap with highcharter
:
- You have to use
treemap
function fromtreemap
package to create a list with your data frame that will serve as input forhctreemap
function hctreemap
fails if some category name is the same as any of its subcategories. To avoid this, make sure that all names are distinct.
This is the treemap:
< !-- iframe plugin v.4.3 wordpress.org/plugins/iframe/ -->
Here you can see a full size version of it.
There can be seen several things at a glance: most of the pieces are drawings and prints and european sculpture and decorative arts (in concrete, prints and textiles), there is also big number of costumes, arms and armor is a very fragmented department … I think treemap is a good way to see what kind of works owns The Met.
Mi favorite spanish piece in The Met is the stunning Portrait of Juan de Pareja by Velázquez, which illustrates this post: how nice would be to see it next to El Primo in El Museo del Prado!
Feel free to use my code to do your own experiments:
library(data.table) library(dplyr) library(stringr) library(highcharter) library(treemap) file="MetObjects.csv" # Download data if (!file.exists(file)) download.file(paste0("https://media.githubusercontent.com/media/metmuseum/openaccess/master/", file), destfile=file, mode='wb') # Read data data=fread(file, sep=",", encoding="UTF-8") # Modify column names to remove blanks colnames(data)=gsub(" ", ".", colnames(data)) # Clean columns to prepare for searching data[,`:=`(Artist.Nationality_aux=toupper(Artist.Nationality) %>% str_replace_all("\\[\\d+\\]", "") %>% iconv(from='UTF-8', to='ASCII//TRANSLIT'), Culture_aux=toupper(Culture) %>% str_replace_all("\\[\\d+\\]", "") %>% iconv(from='UTF-8', to='ASCII//TRANSLIT'), Country_aux=toupper(Country) %>% str_replace_all("\\[\\d+\\]", "") %>% iconv(from='UTF-8', to='ASCII//TRANSLIT'))] # Look for Spanish artworks data[Artist.Nationality_aux %like% "SPANISH" | Culture_aux %like% "SPANISH" | Country_aux %like% "SPAIN"] -> data_spain # Count artworks by Department and Classification data_spain %>% mutate(Classification=ifelse(Classification=='', "miscellaneous", Classification)) %>% mutate(Department=tolower(Department), Classification1=str_match(Classification, "(\\w+)(-|,|\\|)")[,2], Classification=ifelse(!is.na(Classification1), tolower(Classification1), tolower(Classification))) %>% group_by(Department, Classification) %>% summarize(Objects=n()) %>% ungroup %>% mutate(Classification=ifelse(Department==Classification, paste0(Classification, "#"), Classification)) %>% as.data.frame() -> dfspain # Do treemap without drawing tm_dfspain <- treemap(dfspain, index = c("Department", "Classification"), draw=F, vSize = "Objects", vColor = "Objects", type = "index") # Do highcharter treemap hctreemap( tm_dfspain, allowDrillToNode = TRUE, allowPointSelect = T, levelIsConstant = F, levels = list( list( level = 1, dataLabels = list (enabled = T, color = '#f7f5ed', style = list("Size" = "1em")), borderWidth = 1 ), list( level = 2, dataLabels = list (enabled = F, align = 'right', verticalAlign = 'top', style = list("textShadow" = F, "Weight" = 'light', "Size" = "1em")), borderWidth = 0.7 ) )) %>% hc_title(text = "Spanish Artworks in The Met") %>% hc_subtitle(text = "Distribution by Department") -> plot plot
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.