Salaries by alma mater – an interactive visualization with R and plotly
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Based on an interesting dataset from the Wall Street Journal I made the above visualization of the median starting salary for US college graduates from different undergraduate institutions (I have also looked at the mid-career salaries, and the salary increase, but more on that later). However, I thought that it would be a lot more informative, if it were interactive. To the very least I wanted to be able to see the school names when hovering over or clicking on the points with the mouse.
Luckily, this kind of interactivity can be easily achieved in R with the library plotly
, especially due to its excellent integration with ggplot2
, which I used to produce the above figure. In the following I describe how exactly this can be done.
Before I show you the interactive visualizations, a few words on the data preprocessing, and on how the map and the points are plotted with ggplot2
:
- I generally use functions from the tidyverse R packages.
- I save the data in the data frame
salaries
, and transform the given amounts to proper floating point numbers, stripping the dollar signs and extra whitespaces. - The data provide school names. However, I need to find out the exact geographical coordinates of each school to put it on the map. This can be done in a very convenient way, by using the
geocode
function from theggmap
R package:school_longlat <- geocode(salaries$school) school_longlatschool salaries <- left_join(salaries, school_longlat)
salaries
can be easily determined with a grep
search:
grep("alaska", salaries$school, ignore.case = 1) # [1] 206 grep("hawaii", salaries$school, ignore.case = 1) # [1] 226