Site icon R-bloggers

RObservations #27: Canadian Prime Minister’s Dataset (my “first” Kaggle submission)

[This article was first published on r – bensstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Kaggle has been a platform that has piqued my interest for some time. Based on how I see it, its an open-source driven data science "social-network". Recently, I have decided check out the platform and contribute to it.

In this blog I share and describe the data I collected on Canadian Prime Ministers from Wikipedia, the cleaning script and a small visual as well. If you want to see the notebook I put on Kaggle check it out here and be sure to give it an upvote!

The Data

To download the data, go here. The data was manually collected via manual keying and copy-paste from Wikipedia, which is not really ideal to work with -yet. The lubridate and anytime prove to be very helpful for transforming the date columns accordingly. The cleaning script for this data is:

library(tidyverse)
prime_ministers <- readr::read_csv('../input/canadian-prime-ministers/Canadian Prime Ministers Dataset.csv', show_col_types=FALSE) %>%
    # Format Dates Properly
    mutate(`Term Start` = anytime::anydate(`Term Start`),
           # For Justin Trudeau's Term We'll have it up to today ()
           `Term End` = ifelse(Name == "Justin Trudeau",lubridate::today(),anytime::anydate(`Term End`)) %>% anytime::anydate())

prime_ministers
No.NamePolitical PartyTerm StartTerm End
<chr><chr><chr><date><date>
1 (1 of 2)John A. MacdonaldLiberal-Conservative1867-07-011873-11-05
2Alexander MackenzieLiberal1873-11-071878-10-08
1 (2 of 2)John A. MacdonaldLiberal-Conservative1878-10-171891-06-06
3John AbbottLiberal-Conservative1891-06-161891-11-24
4John ThompsonLiberal-Conservative1892-12-051894-12-12
5Mackenzie BowellConservative1894-12-211896-04-27
6Charles TupperConservative1896-05-011896-07-08
7Wilfrid LaurierLiberal1896-07-111911-10-06
8Robert BordenGovernment (Unionist)1911-10-101920-07-10
9 (1 of 2)Arthur MeighenConservative1920-07-101921-12-29
10 (1 of 3)William Lyon Mackenzie KingLiberal1921-12-291926-06-28
9 (2 of 2)Arthur MeighenConservative1926-06-291926-09-25
10 (2 of 3)William Lyon Mackenzie KingLiberal1926-09-251930-08-07
11R. B. BennettConservative1930-08-071935-10-23
10 (3 of 3)William Lyon Mackenzie KingLiberal1935-10-231948-11-15
12Louis St. LaurentLiberal1948-11-151957-06-21
13John DiefenbakerProgressive Conservative1957-06-211963-04-22
14Lester B. PearsonLiberal1963-04-221968-04-20
15 (1 of 2)Pierre TrudeauLiberal1968-04-201979-06-03
16Joe ClarkProgressive Conservative1979-06-041980-03-02
15 (2 of 2)Pierre TrudeauLiberal1980-03-031984-06-29
17John TurnerLiberal1984-06-301984-09-16
18Brian MulroneyProgressive Conservative1984-09-171993-06-24
19Kim CampbellProgressive Conservative1993-06-251993-11-03
20Jean ChrétienLiberal1993-11-042003-12-11
21Paul MartinLiberal2003-12-122006-02-05
22Stephen HarperConservative2006-02-062015-11-03
23Justin TrudeauLiberal2015-11-042022-03-27

Visualizing the Data

Now that the data has been cleaned, we can visualize the data with a gantt chart by using the vistime library. While it is limiting in terms of customizability, the vistime library is a great way to make a nice gantt chart without digging into to much ggplot2,plotly or highcharter syntax. For my Kaggle notebook I chose highcharts because it was the nicest, low effort theme. However since this blog is on a (currently) free WordPress site and does not accommodate iframes, the workaround would be to use ggplot2:

library(vistime)
vistime_data <- data.frame(event = mapply(function(x,y) {paste0(x," (",y,")")}, prime_ministers$Name, prime_ministers$`Political Party`) ,
                            start = prime_ministers$`Term Start`, 
                            end =  prime_ministers$`Term End`, 
                            group = "Prime Ministers",
                            color = ifelse(prime_ministers$`Political Party` =="Conservative","#0047AB",
                                           ifelse(prime_ministers$`Political Party` =="Liberal","#D22B2B",
                                                 ifelse(prime_ministers$`Political Party` =="Liberal-Conservative","#5D3FD3",
                                                       ifelse(prime_ministers$`Political Party` =="Progressive Conservative","#6495ED",
                                                             "#FFBF00")))),
                           Name =prime_ministers$Name,
                           Political_Party = prime_ministers$`Political Party`,
                           blank = '')

vistime_data %>%
    gg_vistime( title = "Canadian Prime Ministers",
                col.event = "blank",
                col.group="Name")+
  scale_fill_manual(name='',
                     breaks=c('Conservative', 'Liberal', 'Liberal-Conservative','Progressive Conservative','Government (Unionist)'),
                     values=c('Conservative'='#0047AB', 
                              'Liberal'='#D22B2B', 
                              'Liberal-Conservative'='#5D3FD3',
                              'Progressive Conservative' = '#6495ED',
                              'Government (Unionist)' = '#FFBF00'
                               ))+

theme(axis.text.x= element_text(size=20),
      axis.text.y= element_text(size=20),
      legend.text = element_text(size=15),
      plot.title = element_text(size=30))+
guides(fill = guide_legend(override.aes = list(size =10)))

And there you have it! A brief example for using this dataset. Ideally this can be used as supplemental data for a larger political/economic analysis. Be sure to let me know what you do with data in the discussion of this dataset!

Thank you for checking this out!

Want to see more of my content?

Be sure to subscribe and never miss an update!

To leave a comment for the author, please follow the link and comment on their blog: r – bensstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.