Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Kaggle has been a platform that has piqued my interest for some time. Based on how I see it, its an open-source driven data science "social-network". Recently, I have decided check out the platform and contribute to it.
In this blog I share and describe the data I collected on Canadian Prime Ministers from Wikipedia, the cleaning script and a small visual as well. If you want to see the notebook I put on Kaggle check it out here and be sure to give it an upvote!
The Data
To download the data, go here. The data was manually collected via manual keying and copy-paste from Wikipedia, which is not really ideal to work with -yet. The lubridate
and anytime
prove to be very helpful for transforming the date columns accordingly. The cleaning script for this data is:
library(tidyverse) prime_ministers <- readr::read_csv('../input/canadian-prime-ministers/Canadian Prime Ministers Dataset.csv', show_col_types=FALSE) %>% # Format Dates Properly mutate(`Term Start` = anytime::anydate(`Term Start`), # For Justin Trudeau's Term We'll have it up to today () `Term End` = ifelse(Name == "Justin Trudeau",lubridate::today(),anytime::anydate(`Term End`)) %>% anytime::anydate()) prime_ministers
No. | Name | Political Party | Term Start | Term End |
---|---|---|---|---|
<chr> | <chr> | <chr> | <date> | <date> |
1 (1 of 2) | John A. Macdonald | Liberal-Conservative | 1867-07-01 | 1873-11-05 |
2 | Alexander Mackenzie | Liberal | 1873-11-07 | 1878-10-08 |
1 (2 of 2) | John A. Macdonald | Liberal-Conservative | 1878-10-17 | 1891-06-06 |
3 | John Abbott | Liberal-Conservative | 1891-06-16 | 1891-11-24 |
4 | John Thompson | Liberal-Conservative | 1892-12-05 | 1894-12-12 |
5 | Mackenzie Bowell | Conservative | 1894-12-21 | 1896-04-27 |
6 | Charles Tupper | Conservative | 1896-05-01 | 1896-07-08 |
7 | Wilfrid Laurier | Liberal | 1896-07-11 | 1911-10-06 |
8 | Robert Borden | Government (Unionist) | 1911-10-10 | 1920-07-10 |
9 (1 of 2) | Arthur Meighen | Conservative | 1920-07-10 | 1921-12-29 |
10 (1 of 3) | William Lyon Mackenzie King | Liberal | 1921-12-29 | 1926-06-28 |
9 (2 of 2) | Arthur Meighen | Conservative | 1926-06-29 | 1926-09-25 |
10 (2 of 3) | William Lyon Mackenzie King | Liberal | 1926-09-25 | 1930-08-07 |
11 | R. B. Bennett | Conservative | 1930-08-07 | 1935-10-23 |
10 (3 of 3) | William Lyon Mackenzie King | Liberal | 1935-10-23 | 1948-11-15 |
12 | Louis St. Laurent | Liberal | 1948-11-15 | 1957-06-21 |
13 | John Diefenbaker | Progressive Conservative | 1957-06-21 | 1963-04-22 |
14 | Lester B. Pearson | Liberal | 1963-04-22 | 1968-04-20 |
15 (1 of 2) | Pierre Trudeau | Liberal | 1968-04-20 | 1979-06-03 |
16 | Joe Clark | Progressive Conservative | 1979-06-04 | 1980-03-02 |
15 (2 of 2) | Pierre Trudeau | Liberal | 1980-03-03 | 1984-06-29 |
17 | John Turner | Liberal | 1984-06-30 | 1984-09-16 |
18 | Brian Mulroney | Progressive Conservative | 1984-09-17 | 1993-06-24 |
19 | Kim Campbell | Progressive Conservative | 1993-06-25 | 1993-11-03 |
20 | Jean Chrétien | Liberal | 1993-11-04 | 2003-12-11 |
21 | Paul Martin | Liberal | 2003-12-12 | 2006-02-05 |
22 | Stephen Harper | Conservative | 2006-02-06 | 2015-11-03 |
23 | Justin Trudeau | Liberal | 2015-11-04 | 2022-03-27 |
Visualizing the Data
Now that the data has been cleaned, we can visualize the data with a gantt chart by using the vistime
library. While it is limiting in terms of customizability, the vistime
library is a great way to make a nice gantt chart without digging into to much ggplot2
,plotly
or highcharter
syntax. For my Kaggle notebook I chose highcharts because it was the nicest, low effort theme. However since this blog is on a (currently) free WordPress site and does not accommodate iframes, the workaround would be to use ggplot2
:
library(vistime) vistime_data <- data.frame(event = mapply(function(x,y) {paste0(x," (",y,")")}, prime_ministers$Name, prime_ministers$`Political Party`) , start = prime_ministers$`Term Start`, end = prime_ministers$`Term End`, group = "Prime Ministers", color = ifelse(prime_ministers$`Political Party` =="Conservative","#0047AB", ifelse(prime_ministers$`Political Party` =="Liberal","#D22B2B", ifelse(prime_ministers$`Political Party` =="Liberal-Conservative","#5D3FD3", ifelse(prime_ministers$`Political Party` =="Progressive Conservative","#6495ED", "#FFBF00")))), Name =prime_ministers$Name, Political_Party = prime_ministers$`Political Party`, blank = '') vistime_data %>% gg_vistime( title = "Canadian Prime Ministers", col.event = "blank", col.group="Name")+ scale_fill_manual(name='', breaks=c('Conservative', 'Liberal', 'Liberal-Conservative','Progressive Conservative','Government (Unionist)'), values=c('Conservative'='#0047AB', 'Liberal'='#D22B2B', 'Liberal-Conservative'='#5D3FD3', 'Progressive Conservative' = '#6495ED', 'Government (Unionist)' = '#FFBF00' ))+ theme(axis.text.x= element_text(size=20), axis.text.y= element_text(size=20), legend.text = element_text(size=15), plot.title = element_text(size=30))+ guides(fill = guide_legend(override.aes = list(size =10)))
And there you have it! A brief example for using this dataset. Ideally this can be used as supplemental data for a larger political/economic analysis. Be sure to let me know what you do with data in the discussion of this dataset!
Thank you for checking this out!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.