A classical analysis (Radio Swiss classic program)

[This article was first published on Maëlle, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I am not a classical music expert at all, but I happen to have friends who are, and am even married to someone playing the cello (and the ukulele!). I appreciate listening to such music from time to time, in particular Baroque music. A friend made me discover Radio Swiss classic, an online radio playing classical music all day and all night long, with a quite nice variety, and very little speaking between pieces, with no ads (thank you, funders of the radio!). Besides, the voices telling me which piece has just been played are really soothing, so Radio Swiss classic is a good one in my opinion.

Today, instead of anxiously waiting for the results of the French presidential elections, I decided to download the program of the radio in the last years and have a quick look at it, since after all, the website says that the radio aims at relaxing people.

Scraping the program

My webscraping became a bit more elegant because I followed the advice of EP alias expersso, who by the way should really start blogging. I started downloading programs since September 2008 because that’s when I met the friend who told me about Radio Swiss Classic.

dates <- seq(from = lubridate::ymd("2008-09-01"),
             to = lubridate::ymd("2017-04-22"),
             by = "1 day")


base_url <- "http://www.radioswissclassic.ch/en/music-programme/search/"

get_one_day_program <- function(date, base_url){
  # in order to see progress
  message(date)
  
  # build URL
  date_as_string <- as.character(date)
  date_as_string <- stringr::str_replace_all(date_as_string, "-", "")
  url <- paste0(base_url, date_as_string)
  
  # read page
  page <- try(xml2::read_html(url),
              silent = TRUE)
  if(is(page, "try-error")){
    message("horribly wrong")

    closeAllConnections()
    return(NULL)
  }else{
    
    # find all times, artists and pieces
    times <- xml2::xml_text(xml2::xml_find_all(page, 
                                               xpath="//span[@class='time hidden-xs']//text()"))
    artists <- xml2::xml_text(xml2::xml_find_all(page, 
                                                 xpath="//span[@class='titletag']//text()"))
    pieces <- xml2::xml_text(xml2::xml_find_all(page, 
                                                xpath="//span[@class='artist']//text()"))
    # the last artist and piece are the current ones
    artists <- artists[1:(length(artists) - 1)]
    pieces <- pieces[1:(length(pieces) - 1)]
    
    # get a timedate from each time
    timedates <- paste(as.character(date), times)
    timedates <- lubridate::ymd_hm(timedates)
    timedates <- lubridate::force_tz(timedates, tz = "Europe/Zurich")
    
    # format the output
    program <- tibble::tibble(time = timedates,
                              artist = artists,
                              piece = pieces)
    
    return(program)
  }
  
}

programs <- purrr::map(dates, get_one_day_program, 
                       base_url = base_url)

programs <- dplyr::bind_rows(programs)

save(programs, file = "data/radioswissclassic_programs.RData")

There were some days without any program on the website, for which the website said something was horribly wrong with the server.

load("data/radioswissclassic_programs.RData")
wegot <- length(unique(lubridate::as_date(programs$time)))
wewanted <- length(seq(from = lubridate::ymd("2008-09-01"),
                       to = lubridate::ymd("2017-04-22"),
                       by = "1 day"))

However, I got a program for approximately 0.96 of the days.

Who are the most popular composers?

library("magrittr")
table(programs$artist) %>%
  broom::tidy() %>%
  dplyr::arrange(desc(Freq)) %>%
  head(n = 20) %>%
  knitr::kable()
Var1 Freq
Wolfgang Amadeus Mozart 37823
Ludwig van Beethoven 20936
Joseph Haydn 18140
Franz Schubert 15596
Antonio Vivaldi 14947
Johann Sebastian Bach 12003
Felix Mendelssohn-Bartholdy 11541
Antonin Dvorak 10265
Gioachino Rossini 9591
Frédéric Chopin 8470
Piotr Iljitsch Tchaikowsky 8092
Georg Friedrich Händel 7935
Tomaso Albinoni 6175
Gaetano Donizetti 5945
Giuseppe Verdi 5639
Johannes Brahms 5526
Johann Nepomuk Hummel 5439
Camille Saint-Saëns 5395
Luigi Boccherini 5130
Johann Christian Bach 4976

I’ll have to admit that I don’t even know all the composers in this table but they’re actually all famous according to my live-in classical music expert. Radio Swiss classic allows listeners to rate pieces, so the most popular ones are programmed more often, and well I guess the person making the programs also tend to program famous composers quite often.

library("ggplot2")
library("hrbrthemes")
table(programs$artist) %>%
  broom::tidy() %>%
  ggplot() +
  geom_histogram(aes(Freq)) +
  scale_x_log10() +
  theme_ipsum(base_size = 14) 

plot of chunk unnamed-chunk-3

Interestingly, but not that surprisingly I guess given the popularity of, say, Mozart, the distribution of occurrences by composers seems to be log-normally distributed.

How long are pieces?

On the website of Radio Swiss classic it is stated that pieces are longer in the evening than during the day, which I wanted to try and see. Because the program of the radio was not corrected for time changes (i.e. on 25 hour-days there are only 24 hours of music according to the online program), I shall only look at pieces whose duration is smaller than 60 minutes, which solves the issue of missing days at the same time.

programs <- dplyr::arrange(programs, time)
programs <- dplyr::mutate(programs,
                          duration = difftime(lead(time, 1),
                                       time,
                                       units = "min"))

programs <- dplyr::mutate(programs,
                          duration = ifelse(duration > 60,
                                            NA, duration))
programs <- dplyr::mutate(programs,
                          hour = as.factor(lubridate::hour(time)))

programs %>%
ggplot() +
  geom_boxplot(aes(hour, duration))+
  theme_ipsum(base_size = 14) 

plot of chunk unnamed-chunk-4

I don’t find the difference between day and night that striking, maybe I could try to define day and night to have a prettier figure (but I won’t do any test, I soon need to go watch TV).

programs %>%
  dplyr::mutate(night = (lubridate::hour(time) <= 4 | lubridate::hour(time) >= 20)) %>%
ggplot() +
  geom_boxplot(aes(night, duration))+
  theme_ipsum(base_size = 14)

plot of chunk unnamed-chunk-5

Conclusion

The website also states that the pieces are more lively in the morning, but I have no data to which to match the titles of the pieces to investigate that claim. Well I have not even looked for such data. Another extension that I would find interesting would be to match each composer’s name to a style and then see how often each style is played. Now I’ll stop relaxing and go stuff my face with food in front of the election results!

To leave a comment for the author, please follow the link and comment on their blog: Maëlle.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)