Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Version 0.6.0 of the hrbrthemes
theme_ft_rc()
since it is an homage to the wonderful new chart theme developed by the @ft_data crew over at the Financial Times (you can see examples from their work here).
While there was nothing stopping folks from using the GitHub version, the CRAN release makes it more widely available. There are still intermittent issues with s for some folks which I’ll be working on for the next release.
Since you’ve already seen lots of examples of these charts I won’t just make a gratuitous example using the theme. I will, however, make some charts based on a new data package dubbed iceout
iceout
package was originally conceived by Ben Tupper from the Bigelow Laboratory for Ocean Sciences. I keep an eye on fellow Mainer repositories and I did not realize (but should have known) that researches keep track of when inland bodies of water freeze and thaw. The package name is derived from the term used for the thaw measurements (“ice-out” or “ice-off”).
Before becoming obsessed with this data and getting the package to the current state it is in, the original codebase worked off of a USGS Lake Ice-Out Data for New England dataset that focused solely on New England and only went up to 2005. Some digging discovered that
- Maine’s Department of Agriculture and Forestry maintains online records since 2003; and,
- Minnesota’s Department of Natural Resources maintains a comprehensive database of records going back to the 1800’s.
But I hit the jackpot after discovering the U.S. National Snow & Ice Data Center’s Global Lake and River Ice Phenology dataset which:
So, I converted the original package to a data package containing all four of those datasets plus some interactive functions for pulling “live” data and a set of “builders” to regenerate the databases. Let’s take a quick look at what’s in the NSIDC data and the global coverage area:
library(iceout) # github/hrbrmstr/iceout library(hrbrthemes) library(ggplot2) library(dplyr) data("nsidc_iceout") glimpse(nsidc_iceout) ## Observations: 35,918 ## Variables: 37 ## $ lakecode <chr> "ARAI1", "ARAI1", "ARAI1", "ARAI1", "ARAI1", "ARAI1", "ARAI1… ## $ lakename <chr> "Lake Suwa", "Lake Suwa", "Lake Suwa", "Lake Suwa", "Lake Su… ## $ lakeorriver <chr> "L", "L", "L", "L", "L", "L", "L", "L", "L", "L", "L", "L", … ## $ season <chr> "1443-44", "1444-45", "1445-46", "1446-47", "1447-48", "1448… ## $ iceon_year <dbl> 1443, 1444, 1445, 1446, 1447, 1448, 1449, 1450, 1451, 1452, … ## $ iceon_month <dbl> 12, 11, 12, 12, 11, 12, 12, 12, 12, 11, 12, 12, 12, 12, 12, … ## $ iceon_day <dbl> 8, 23, 1, 2, 30, 8, 13, 8, 23, 28, 3, 5, 1, 5, 6, 20, 10, 15… ## $ iceoff_year <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … ## $ iceoff_month <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … ## $ iceoff_day <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … ## $ duration <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … ## $ latitude <dbl> 36.15, 36.15, 36.15, 36.15, 36.15, 36.15, 36.15, 36.15, 36.1… ## $ longitude <dbl> 138.08, 138.08, 138.08, 138.08, 138.08, 138.08, 138.08, 138.… ## $ country <chr> "Japan", "Japan", "Japan", "Japan", "Japan", "Japan", "Japan… ## $ froze <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, … ## $ obs_comments <chr> "calendar correction for ice_on: -30 days of original data; … ## $ area_drained <dbl> 531, 531, 531, 531, 531, 531, 531, 531, 531, 531, 531, 531, … ## $ bow_comments <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … ## $ conductivity_us <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … ## $ elevation <dbl> 759, 759, 759, 759, 759, 759, 759, 759, 759, 759, 759, 759, … ## $ filename <chr> "ARAI", "ARAI", "ARAI", "ARAI", "ARAI", "ARAI", "ARAI", "ARA… ## $ initials <chr> "ARAI", "ARAI", "ARAI", "ARAI", "ARAI", "ARAI", "ARAI", "ARA… ## $ inlet_streams <chr> "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", … ## $ landuse_code <chr> "UAFO", "UAFO", "UAFO", "UAFO", "UAFO", "UAFO", "UAFO", "UAF… ## $ largest_city_population <dbl> 52000, 52000, 52000, 52000, 52000, 52000, 52000, 52000, 5200… ## $ max_depth <dbl> 7.6, 7.6, 7.6, 7.6, 7.6, 7.6, 7.6, 7.6, 7.6, 7.6, 7.6, 7.6, … ## $ mean_depth <dbl> 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, … ## $ median_depth <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … ## $ power_plant_discharge <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … ## $ secchi_depth <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … ## $ shoreline <dbl> 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, … ## $ surface_area <dbl> 12.9, 12.9, 12.9, 12.9, 12.9, 12.9, 12.9, 12.9, 12.9, 12.9, … ## $ state <chr> "Nagano Prefecture", "Nagano Prefecture", "Nagano Prefecture… ## $ iceon_date <date> 1443-12-08, 1444-11-23, 1445-12-01, 1446-12-02, 1447-11-30,… ## $ iceon_doy <dbl> 342, 328, 335, 336, 334, 343, 347, 342, 357, 333, 337, 339, … ## $ iceout_date <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ iceout_doy <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, … maps::map("world", ".", exact = FALSE, plot = FALSE, fill = TRUE) %>% fortify() -> wrld ggplot() + ggalt::geom_cartogram( data = wrld, map = wrld, aes(long, lat, map_id=region), fill="#3B454A", color = "white", size = 0.125 ) + geom_point( data = distinct(nsidc_iceout, lakeorriver, longitude, latitude), aes(longitude, latitude, fill = lakeorriver), size = 1.5, color = "#2b2b2b", stroke = 0.125, shape = 21 ) + scale_fill_manual( name = NULL, values = c("L"="#fdbf6f", "R"="#1f78b4"), labels=c("L" = "Lake", "R" = "River") ) + ggalt::coord_proj("+proj=wintri", ylim = range(nsidc_iceout$latitude, na.rm = TRUE)) + labs(title = "NSIDC Dataset Coverage") + theme_ft_rc(grid="") + theme(legend.position = c(0.375, 0.1)) + theme(axis.text = element_blank(), axis.title = element_blank())
W00t! Lots of data (though not all of the extra features are populated for all readings/areas)!
I think the reason the ice-out data garnered my obsession was how it can be used as another indicator that we are indeed in the midst of a climate transformation. Let’s look at the historical ice-out information for Maine inland bodies of water:
filter(nsidc_iceout, country == "United States", state == "ME") %>% mutate(iceout_date = as.Date(format(iceout_date, "2020-%m-%d"))) %>% # we want the Y axis formatted as month-day so we choose a leap year to ensure we get leap dates (if any) ggplot(aes(iceoff_year, iceout_date)) + geom_point(aes(color = lakename), size = 0.5, alpha=1/4) + geom_smooth(aes(color = lakename), se=FALSE, method = "loess", size=0.25) + scale_y_date(date_labels = "%b-%d") + labs( x = NULL, y = "Ice-out Month/Day", color = NULL, title = "Historical Ice-out Data/Trends for Maine Inland Bodies of Water" ) + theme_ft_rc(grid="XY")
You can follow that code-pattern to look at other states. It’s also fun to look at the ice-out date distributions by latitude grouping:
filter(nsidc_iceout, !is.na(latitude) & !is.na(longitude) & !is.na(iceout_date)) %>% filter(country == "United States") %>% mutate(iceout_date = as.Date(format(iceout_date, "2020-%m-%d"))) %>% mutate(lat_grp = cut(latitude, scales::pretty_breaks(5)(latitude), ordered_result = TRUE)) %>% arrange(desc(iceoff_year)) %>% ggplot() + ggbeeswarm::geom_quasirandom( aes(lat_grp, iceout_date, fill = iceoff_year), groupOnX = TRUE, shape = 21, size =1, color = "white", stroke = 0.125, alpha=1/2 ) + scale_y_date(date_labels = "%b-%d") + viridis::scale_fill_viridis(name = "Year", option = "magma") + labs( x = "Latitude Grouping", y = "Ice-out Month/Day", title = "U.S. Ice-out Historical Day/Month Distributions by Latitude Grouping" ) + theme_ft_rc(grid="Y")
If you want to focus on individual lakes there’s a Shiny app for that (well one for the U.S. anyway).
After loading the package, just enter explore_us()
at an R console and you’ll see something like this:
The leaflet view will zoom to each new lake selected and the graph will be updated as well.
Other Package News
The sergeant
I’ve finally updated the Java library dependencies in pdfboxjars
pdfbox
There’s a new package dubbed reapr
curl
+ httr
+ rvest
. Fundamentally, it provides some coder-uplift when scraping data. The README has examples but here’s what you get on an initial scrape of this blog’s index page:
reapr::reap_url("http://rud.is/b") ## Title: rud.is | "In God we trust. All others must bring data" ## Original URL: http://rud.is/b ## Final URL: https://rud.is/b/ ## Crawl-Date: 2019-01-17 19:51:09 ## Status: 200 ## Content-Type: text/html; charset=UTF-8 ## Size: 50 kB ## IP Address: 104.236.112.222 ## Tags: body[1], center[1], form[1], h2[1], head[1], hgroup[1], html[1], ## label[1], noscript[1], section[1], title[1], ## aside[2], nav[2], ul[2], style[5], img[6], ## input[6], article[8], time[8], footer[9], h1[9], ## header[9], p[10], li[19], meta[20], div[31], ## script[40], span[49], link[53], a[94] ## # Comments: 17 ## Total Request Time: 2.093s
The reap_url()
function:
- Uses
httr::GET()
to make web connections and retrieve content which enables it to behave more like an actual (non-javascript-enabled) browser. You can pass anythinghttr::GET()
can handle to...
(e.g.httr::user_agent()
) to have as much granular control over the interaction as possible. - Returns a richer set of data. After the
httr::response
object is obtained many tasks are performed including:- timestamping of the URL crawl
- extraction of the asked-for URL and the final URL (in the case
of redirects) - extraction of the IP address of the target server
- extraction of both plaintext and parsed (
xml_document
) HTML - extraction of the plaintext webpage
<title>
(if any) - generation of a dynamic list tags in the document which can be
fed directly to HTML/XML search/retrieval function (which may
speed up node discovery) - extraction of the text of all comments in the HTML document
- inclusion of the full
httr::response
object with the returned
object - extraction of the time it took to make the complete request
I’m still wrestling with the API so definitely file issues with suggestions (wherever you’re most comfortable socially coding).
Speaking of IP addresses (bullet 3 above), I finally got some time to study the gdns
clandnstine
There also a toy package forecequotes
cli
& crayon
packages” than anything else. But if you like Star Wars, random quote APIs and want to integrate richer command line interface output into your work, then definitely give it a peek.
Finally, I haven’t used R’s direct C interface in a while (since Rcpp is addictive and handy) and wanted to keep those skills fresh, so I made a wrapper to an old (in internet years) IP address trie C library. The underlying library is much slower than what we use in iptools
but it works, does a bit more than its iptoos
counterpart and covers data marshaling, external pointer handling, and attribute/class setting so it may be a half-decent reference package for using the R< ->C bridge.
FIN
If you know of more/better ice-out data please drop an issue in the Bigelow Labs’ iceout
repo and I’ll get it integrated. And, if you do your own ice-out exploration definitely blog about it, tell R Weekly and drop a note in the comments.
Here are links to all the mentioned packages grouped by social coding platform (so you can interact/collaborate wherever you feel most comfortable working):
sr.ht
GitLab
GitHub
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.