Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
D Kelly O’Day did a great post on charting NASA’s Goddard Institute for Space Studies (GISS) temperature anomaly data, but it sticks with base R for data munging & plotting. While there’s absolutely nothing wrong with base R operations, I thought a modern take on the chart using dplyr
, magrittr
& tidyr
for data manipulation and ggplot2
for formatting would be helpful for the scores of new folk learning R this year (our little language is becoming all the rage, it seems). I also really enjoy working with weather data.
Before further exposition, here’s the result:
I made liberal use of the “piping” idiom encouraged magrittr
, dplyr
and other new R packages, including the forward assignment operator ->
(which may put some folks off a bit). That also meant using magrittr
‘s aliases for [
and [[
, which are more readable in pipes.
I don’t use library(tidyr)
since tidyr
‘s extract
conflicts with magrittr
‘s, but you’ll see a tidyr::gather
in the code for wide-to-long data shaping.
I chose to use the monthly temperature anomaly data as a base layer in the chart as a contrast to the monthly- and annual-anomaly means. I also marked the hottest annual- and annual-mean anomalies and framed the decades with vertical markers.
There are no hardcoded years or decades anywhere in the ggplot2
code, so this should be quite reusable as the data source gets updated.
As I come back to the chart, I think there may be a bit too much “chart junk” on it, but you can tweak it to your own aesthetic preferences (if you do, drop a note in the comments with a link to your creation).
The code is below and in this gist.
library(httr) library(magrittr) library(dplyr) library(ggplot2) # data retrieval ---------------------------------------------------------- # the user agent string was necessary for me; YMMV pg <- GET("http://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt", user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A")) # extract monthly data ---------------------------------------------------- content(pg, as="text") %>% strsplit("n") %>% extract2(1) %>% grep("^[[:digit:]]", ., value=TRUE) -> lines # extract column names ---------------------------------------------------- content(pg, as="text") %>% strsplit("n") %>% extract2(1) %>% extract(8) %>% strsplit(" +") %>% extract2(1) -> lines_colnames # make data frame --------------------------------------------------------- data <- read.table(text=lines, stringsAsFactors=FALSE) colnames(data) <- lines_colnames # transform data frame ---------------------------------------------------- data %>% tidyr::gather(month, value, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec) %>% # wide to long mutate(value=value/100) %>% # convert to degree Celcius change select(year=Year, month, value) %>% # only need these fields mutate(date=as.Date(sprintf("%d-%d-%d", year, month, 1)), # make proper dates decade=year %/% 10, # calc decade start=decade*10, end=decade*10+9) %>% # calc decade start/end group_by(decade) %>% mutate(decade_mean=mean(value)) %>% # calc decade mean group_by(year) %>% mutate(annum_mean=mean(value)) %>% # calc annual mean ungroup -> data # start plot -------------------------------------------------------------- gg <- ggplot() # decade vertical markers ------------------------------------------------- gg <- gg + geom_vline(data=data %>% select(end), aes(xintercept=as.numeric(as.Date(sprintf("%d-12-31", end)))), size=0.5, color="#4575b4", linetype="dotted", alpha=0.5) # monthly data ------------------------------------------------------------ gg <- gg + geom_line(data=data, aes(x=date, y=value, color="monthly anomaly"), size=0.35, alpha=0.25) gg <- gg + geom_point(data=data, aes(x=date, y=value, color"monthly anomaly"), size=0.75, alpha=0.5) # decade mean ------------------------------------------------------------- gg <- gg + geom_segment(data=data %>% distinct(decade, decade_mean, start, end), aes(x=as.Date(sprintf("%d-01-01", start)), xend=as.Date(sprintf("%d-12-31", end)), y=decade_mean, yend=decade_mean, color="decade mean anomaly"), linetype="dashed") # annual data ------------------------------------------------------------- gg <- gg + geom_line(data=data %>% distinct(year, annum_mean), aes(x=as.Date(sprintf("%d-06-15", year)), y=annum_mean, color="annual mean anomaly"), size=0.5) gg <- gg + geom_point(data=data %>% distinct(year, annum_mean), aes(x=as.Date(sprintf("%d-06-15", year)), y=annum_mean, color="annual mean anomaly"), size=2) # additional annotations -------------------------------------------------- # max annual mean anomaly horizontal marker/text gg <- gg + geom_hline(yintercept=max(data$annum_mean), alpha=0.9, color="#d73027", linetype="dashed", size=0.25) gg <- gg + annotate("text", x=as.Date(sprintf("%d-12-31", mean(range(data$year)))), y=max(data$annum_mean), color="#d73027", alpha=0.9, hjust=0.25, vjust=-1, size=3, label=sprintf("Max annual mean anomaly %2.1fºC", max(data$annum_mean))) gg <- gg + geom_hline(yintercept=max(data$value), alpha=0.9, color="#7f7f7f", linetype="dashed", size=0.25) # max annual anomaly horizontal marker/text gg <- gg + annotate("text", x=as.Date(sprintf("%d-12-31", mean(range(data$year)))), y=max(data$value), color="#7f7f7f", alpha=0.9, hjust=0.25, vjust=-1, size=3, label=sprintf("Max annual anomaly %2.1fºC", max(data$value))) gg <- gg + annotate("text", x=as.Date(sprintf("%d-12-31", range(data$year)[2])), y=min(data$value), size=3, hjust=1, label="Data: http://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt") # set colors -------------------------------------------------------------- gg <- gg + scale_color_manual(name="", values=c("#d73027", "#4575b4", "#7f7f7f")) # set x axis limits ------------------------------------------------------- gg <- gg + scale_x_date(expand=c(0, 1), limits=c(as.Date(sprintf("%d-01-01", range(data$year)[1])), as.Date(sprintf("%d-12-31", range(data$year)[2])))) # add labels -------------------------------------------------------------- gg <- gg + labs(x=NULL, y="GLOBAL Temp Anomalies in 1.0ºC", title=sprintf("GISS Land and Sea Temperature Annual Anomaly Trend (%d to %d)n", range(data$year)[1], range(data$year)[2])) # theme/legend tweaks ----------------------------------------------------- gg <- gg + theme_bw() gg <- gg + theme(panel.grid=element_blank()) gg <- gg + theme(panel.border=element_blank()) gg <- gg + theme(legend.position=c(0.9, 0.2)) gg <- gg + theme(legend.key=element_blank()) gg <- gg + theme(legend.background=element_blank()) gg |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.