Visualizing the Spanish Contribution to The Metropolitan Museum of Art

@aschinchon

5 years ago

[This article was first published on R – Fronkonstin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Well I walk upon the river like it’s easier than land
(Love is All, The Tallest Man on Earth)

The Metropolitan Museum of Art provides here a dataset with information on more than 450.000 artworks in its collection. You can do anything you want with these data: there are no restrictions of use. Each record contains information about the author, title, type of work, dimensions, date, culture and geography of a particular piece.

I can imagine a bunch of things to do with these data but since I am a big fan of highcharter, I have done a treemap, which is an artistic (as well as efficient) way to visualize hierarchical data. A treemap is useful to visualize frequencies. They can handle levels, allowing to navigate to go into detail about any category. Here you can find a good example of treemap.

To read data I use fread function from data.table package. I also use this package to do some data wrangling operations on the data set. After them, I filter it looking for the word SPANISH in the columns Artist Nationality and Culture and looking for the word SPAIN in the column Country. For me, any piece created by an Spanish artist (like this one), coming from Spanish culture (like this one) or from Spain (like this one) is Spanish (this is my very own definition and may do not match with any academical one). Once it is done, it is easy to extract some interesting figures:

There are 5.294 Spanish pieces in The Met, which means a 1,16% of the collection
This percentage varies significantly between departments: it raises to 9,01% in The Cloisters and to 4,83% in The Robert Lehman Collection; on the other hand, it falls to 0.52% in The Libraries and to 0,24% in Photographs.
The Met is home to 1.895 highlights and 44 of them (2,32%) are Spanish; It means that Spanish art is twice as important as could be expected (remember that represents a 1,16% of the entire collection)

My treemap represents the distribution of Spanish artworks by department (column Department) and type of work (column Classification). There are two important things to know before doing a treemap with highcharter:

You have to use treemap function from treemap package to create a list with your data frame that will serve as input for hctreemap function
hctreemap fails if some category name is the same as any of its subcategories. To avoid this, make sure that all names are distinct.

This is the treemap:
< !-- iframe plugin v.4.3 wordpress.org/plugins/iframe/ -->

Here you can see a full size version of it.

There can be seen several things at a glance: most of the pieces are drawings and prints and european sculpture and decorative arts (in concrete, prints and textiles), there is also big number of costumes, arms and armor is a very fragmented department … I think treemap is a good way to see what kind of works owns The Met.

Mi favorite spanish piece in The Met is the stunning Portrait of Juan de Pareja by Velázquez, which illustrates this post: how nice would be to see it next to El Primo in El Museo del Prado!

Feel free to use my code to do your own experiments:

library(data.table)
library(dplyr)
library(stringr)
library(highcharter)
library(treemap)

file="MetObjects.csv"
# Download data
if (!file.exists(file)) download.file(paste0("https://media.githubusercontent.com/media/metmuseum/openaccess/master/", file), 
                                      destfile=file,
                                      mode='wb')
# Read data
data=fread(file, sep=",", encoding="UTF-8")

# Modify column names to remove blanks
colnames(data)=gsub(" ", ".", colnames(data))

# Clean columns to prepare for searching
data[,`:=`(Artist.Nationality_aux=toupper(Artist.Nationality) %>% str_replace_all("\\[\\d+\\]", "") %>% 
             iconv(from='UTF-8', to='ASCII//TRANSLIT'),
           Culture_aux=toupper(Culture) %>% str_replace_all("\\[\\d+\\]", "") %>% 
             iconv(from='UTF-8', to='ASCII//TRANSLIT'),
           Country_aux=toupper(Country) %>% str_replace_all("\\[\\d+\\]", "") %>% 
             iconv(from='UTF-8', to='ASCII//TRANSLIT'))]

# Look for Spanish artworks
data[Artist.Nationality_aux %like% "SPANISH" | 
       Culture_aux %like% "SPANISH" | 
       Country_aux %like% "SPAIN"] -> data_spain

# Count artworks by Department and Classification
data_spain %>% 
  mutate(Classification=ifelse(Classification=='', "miscellaneous", Classification)) %>% 
  mutate(Department=tolower(Department),
         Classification1=str_match(Classification, "(\\w+)(-|,|\\|)")[,2],
         Classification=ifelse(!is.na(Classification1), 
                               tolower(Classification1), 
                               tolower(Classification))) %>% 
  group_by(Department, Classification) %>% 
  summarize(Objects=n()) %>% 
  ungroup %>% 
  mutate(Classification=ifelse(Department==Classification, paste0(Classification, "#"), 
                               Classification)) %>% 
  as.data.frame() -> dfspain

# Do treemap without drawing
tm_dfspain <- treemap(dfspain, index = c("Department", "Classification"),
                      draw=F,
                      vSize = "Objects", 
                      vColor = "Objects",
                      type = "index")

# Do highcharter treemap 
hctreemap(
  tm_dfspain,
  allowDrillToNode = TRUE,
  allowPointSelect = T,
  levelIsConstant = F,
  levels = list(
    list(
      level = 1,
      dataLabels = list (enabled = T, color = '#f7f5ed', style = list("Size" = "1em")),
      borderWidth = 1
    ),
    list(
      level = 2,
      dataLabels = list (enabled = F,  align = 'right', verticalAlign = 'top', 
                         style = list("textShadow" = F, "Weight" = 'light', "Size" = "1em")),
      borderWidth = 0.7
    ) 
  )) %>% 
  hc_title(text = "Spanish Artworks in The Met") %>% 
  hc_subtitle(text = "Distribution by Department") -> plot

plot

To leave a comment for the author, please follow the link and comment on their blog: R – Fronkonstin.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.