Introducing the PWFSLSmoke Package
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Mazama Science has just released the PWFSLSmoke package. Source code is available on GitHub. Here is the package description:
Utilities for working with air quality monitoring data with a focus on small particulates (PM2.5) generated by wildfire smoke. Functions are provided for downloading available data from the United States Environmental Protection Agency (US EPA) and it’s AirNow air quality site. Additional sources of PM2.5 data made accessible by the package include: AIRSIS (password protected), the Western Regional Climate Center (WRCC) and the open source site OpenAQ.
In this post we discuss the reasons for creating this package and provide examples of its use.
For several years now, Mazama Science has been working with the AirFire team at The USFS Pacific Wildland Fire Sciences Lab. The PWFSLSmoke R package is being developed for PWFSL to help modelers and scientists more easily work with PM2.5 data from monitoring locations across North America. The package makes it easier to obtain data, perform analyses and generate reports. It includes functionality to:
- download and easily work with regulatory PM2.5 data from AirNow
- download and quality control raw monitoring data from other sources
- convert between UTC and local timezones
- apply various algorithms to the data (nowcast, rolling means, aggregation, etc.)
- provide interactive timeseries and maps through RStudio’s Viewer pane
- create a variety of publication ready maps and timeseries plots
Data Model
The PWFSLSmoke package is designed to provide a compact, full-featured suite of utilities for working with PM 2.5 data used to monitor wildfire smoke. A uniform data model provides consistent data access across monitoring data available from different agencies. The core data model in this package is defined by the ws_monitor object used to store data associated with groups of individual monitors.
To work efficiently with the package it is important to understand the structure of this data object and which functions operate on it. Package functions that begin with monitor_
, expect objects of class ws_monitor as their first argument. (Note that ‘ws_’ stands for ‘wildfire smoke’.)
Monitoring data will typically be obtained from an agency charged with archiving data acquired at monitoring sites. For wildifre smoke, the primary pollutant is PM 2.5 and the sites archiving this data include AirNow, WRCC and AIRSIS.
The data model for monitoring data consists of an R list
with two dataframes: data
and meta
.
The data
dataframe contains all hourly measurements organized with rows (the ‘unlimited’ dimension) as unique timesteps and columns as unique monitors. The very first column is always named datetime
and contains the POSIXct
datetime in Coordinated Universal Time (UTC).
The meta
dataframe contains all metadata associated with monitoring sites and is organized with rows as unique sites and columns as site attributes. The following columns are guaranteed to exist in the meta
dataframe:
monitorID
– unique ID for each monitoring site (i.e. each instrument deployment)longitude
– decimal degrees Eastlatitude
– decimal degrees Northelevation
– meters above sea leveltimezone
– Olson timezonecountryCode
– ISO 3166-1 alpha-2 codestateCode
– ISO 3166-2 alpha-2 code
(The MazamaSpatialUtils package is used to assign timezones and state and country codes.)
Additional columns may be available in the meta
dataframe and these will depend on the source of the data.
It is important to note that the monitorID
acts as a unique key that connects data with metadata. The monitorID
is used for column names in the data
dataframe and for row names in the meta
dataframe. So the following will always be true:
rownames(ws_monitor$meta) == ws_monitor$meta$monitorID colnames(ws_monitor$data) == c('datetime',ws_monitor$meta$monitorID)
File Size Reduction
Most providers of monitoring data store the data as it comes in, retaining station metadata with every single datum. However, individual monitors rarely move. Thus, replication of metadata vastly inflates the size of data in this “as-it-comes-in” data management strategy. The ws_monitor data model reflects the reality of monitors that rarely move.
Our BigData presentation describes how we used this data model to convert unwieldy EPA annual datasets into bite sized .RData files without losing any information. In one example we reduced a 665 Mb ASCII CSV file into a 3.3 Mb .RData file — 1/200th the size of the original.
Reduced file sizes make working with data from multiple monitors much easier and the remainder of this post demonstrates the package capabilities by looking at smoke generated from the 2015 wildfire season in the Pacific Northwest
Pacific Northwest 2015 Wildfires
In the summer of 2015 Washington state had several catastrophic wildfires that led to heavy smoke in eastern Washington and northern Idaho for quite a few days. We will show how the mapping and timeseries plotting functions in the PWFSLSmoke package can help us visualize the spatial and temporal extent of wildfire smoke during these events.
To begin, let’s have a broader look at AirNow ambient monitoring data for the Pacific Northwest – Washington, Oregon and Idaho – from June 1 through October 31, 2015. First, we create a 24-hr rolling mean for each monitor:
library(PWFSLSmoke) PacNW <- Northwest_Megafires # To work with AirNow data directly, uncomment the next two lines #N_M <- airnow_load(startdate=20150531, enddate=20151101) #PacNW <- monitor_subset(airnow, stateCodes=c("WA", "OR", "ID")) PacNW_24 <- monitor_rollingMean(PacNW, width=24)
Now we can create a map where each monitor is color coded by the maximum value of this 24-hr rolling mean. By default, AQI colors and labels are used but arguments to monitorMap()
and addAQILegend()
allow users to specify their own.
monitorMap(PacNW_24, slice=max) addAQILegend(title="Max AQI", cex=0.7)
The map shows that many areas of the Pacific NW had days with unhealthy air but a cluster of sites in Idaho were particularly bad.
(Note that this is not the regulatory midnight-to-midnight AQI but a continuous 24-hr rolling mean.)
We can us an interactive “leaflet” map to zoom in and get more information:
# Commented out for the blog post #monitorLeaflet(PacNW_24, slice=max)
The monitorMap()
and monitorLeaflet()
plots show us pretty much the same information, except that the leaflet plot allows you to get monitor-specific metadata by clicking on a monitor. In this manner, we can assemble a list of monitorIDs in and around the Nez Perce Reservation in northern Idaho and generate a multi-monitor timeseries plot showing the terrible smoke in late August.
NezPerceIDs <- c("160571012","160690012","160690013","160690014","160490003","160491012") NezPerce <- monitor_subset(PacNW, monitorIDs=NezPerceIDs) monitorPlot_timeseries(NezPerce, style='gnats')
At this point it is clear that August is the month of interest so we’ll subset all of our existing ws_monitor
objects to cover the month of August with full days according to West Coast time.
PacNW <- monitor_subset(PacNW, tlim=c(20150801,20150831), timezone="America/Los_Angeles") PacNW_24 <- monitor_subset(PacNW_24, tlim=c(20150801,20150831), timezone="America/Los_Angeles") NezPerce <- monitor_subset(NezPerce, tlim=c(20150801,20150831), timezone="America/Los_Angeles")
We can use the monitorPlot_dailyBarplot()
function to look at official, midnight-to-midnight AQI levels for each monitor during the month of August:
layout(matrix(seq(6))) par(mar=c(1,1,1,1)) for (monitorID in NezPerceIDs) { siteName <- NezPerce$meta[monitorID,'siteName'] monitorPlot_dailyBarplot(NezPerce, monitorID=monitorID, main=siteName, axes=FALSE) } par(mar=c(5,4,4,2)+.1) layout(1)
We could also take a more automated approach and directly calculate the location with the worst acute smoke (worst hourly value) during this time period:
data <- PacNW$data[,-1] # omit 'datetime' column maxPM25 <- apply(data, 2, max, na.rm=TRUE) worstAcute <- names(sort(maxPM25, decreasing=TRUE))[1:6] intersect(worstAcute, NezPerceIDs) ## [1] "160571012" "160490003" "160491012" PacNW$meta[worstAcute[1],c('siteName','countyName','stateCode')] ## siteName countyName stateCode ## 160571012 Juliaetta E-sampler LATAH ID
We see that three of the sites we identified “by hand” also had the worst acute smoke of any sites in the Pacific NW. The site with the highest measured value of PM 2.5 for the month of august was Julietta in Latah county, Idaho.
Let’s do a similar analysis for chronic smoke. In this case we will calculate the number of days at or above the AQI “unhealthy” level:
PacNW_dailyAvg <- monitor_dailyStatistic(PacNW, FUN=mean, minHours=20) ## Warning in monitor_dailyStatistic(PacNW, FUN = mean, minHours = 20): Found ## 2 timezones. Only the first will be used data <- PacNW_dailyAvg$data[,-1] unhealthyDays <- apply(data, 2, function(x){ sum(x >= AQI$breaks_24[4], na.rm=TRUE) }) worstChronic <- names(sort(unhealthyDays, decreasing=TRUE))[1:6] intersect(worstChronic, NezPerceIDs) ## [1] "160490003" "160571012" "160690014" PacNW$meta[worstChronic[1],c('siteName','countyName','stateCode')] ## siteName countyName stateCode ## 160490003 Kamiah-ID IDAHO ID
The area around the Nez Perce Reservation again had three of the worst sites for chronic smoke with the worst being Kamiah in Idaho county, Idaho.
Latah and Idaho counties were unfortunately downwind of several extraordinarily large Washington state wildfires in 2015 with the following ignition dates:
- Aug 11 – Kettle Complex
- Aug 13 – Grizzly Bear Complex
- Aug 13 – North Star
- Aug 14 – Chelan Complex
- Aug 15 – Okanogan Complex
A little googling and we can obtain a set of coordinates for these fires to use in a new “Google” map of the August daily average maxima with fire icons at fire locations:
fireLons <- c(-118.461,-117.679,-120.039,-119.002,-119.662) fireLats <- c(48.756,46.11,47.814,48.338,48.519) gmap <- monitorGoogleMap(PacNW_dailyAvg, zoom=7, centerLon=-118, centerLat=47, slice=max) addIcon('redFlame', fireLons, fireLats, map=gmap, expansion=0.2) addAQILegend(cex=0.8) title("August, 2015", line=-1.5, cex.main=2)
From this map we can see that the monitor in Omak on the Colleville reservation also registered a daily AQI level of “extreme” as it was in-between the two largest fires. The overall spatial pattern, however, shows that the worst impact from these large fires resulted from smoke drifting SE with the prevailing winds and bunching up in the valleys and plains just upwind of the Bitterroot mountains of northern Idaho.
We will now take a closer look at two tribal monitors: Omak for the Colleville tribe and Kamiah for the Nez Perce.
Omak <- monitor_subset(PacNW, monitorIDs="530470013") Kamiah <- monitor_subset(PacNW, monitorIDs="160490003") layout(matrix(seq(2))) monitorPlot_dailyBarplot(Omak, main="August 2015 Daiy AQI -- Omak, WA", labels_x_nudge=0.8, labels_y_nudge=250) monitorPlot_dailyBarplot(Kamiah, main="August Daily AQI -- Kamiah, ID", labels_x_nudge=0.8, labels_y_nudge=250) layout(1)
We can also examine the diurnal cycle during the very worst days:
layout(matrix(seq(2))) par(mar=c(3,4,4,2)) monitorPlot_timeOfDaySpaghetti(Omak, title="Aug 23-26 Diurnal Smoke -- Omak, WA", xlab='', ylab='', tlim=c(20150823,20150826)) monitorPlot_timeOfDaySpaghetti(Kamiah, title="Aug 23-26 Diurnal Smoke -- Kamiah, ID", xlab='', ylab='', tlim=c(20150823,20150826)) par(mar=c(5,4,4,2)+.1) layout(1)
While both sites had horrible air all day, in Omak it was somewhat less horrible in the evenings. This is in contrast to Kamiah where the least horrible conditions were encountered in the mornings.
This ends the spatio-temporal exploration of smoke from Pacific NW mega-fire in 2015. We hope this inspires you to harness the mapping plotting functionality available in the PWFSLSmoke package.
Best Wishes for Healthy Air!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.