Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Apparent Reason, my new monthly podcast, is a boisterous and non-technical discussion of economics and statistics. In that format I don’t have the luxury of showing charts and graphs to complement my discussion, so I use the playitbyr package to represent the data as sound. (Apparently February is a great month to start R-related podcasts! We even chose the same Creative Commons license. Chalk it up to the zeitgeist.)
The first episode is a meditation on the concept of economic value, a remarkably weird concept. We can assign dollar values to people and to oil tankers, but what do they actually mean? Thinkers from Copernicus to Adam Smith puzzled over what they called the paradox of value: water is necessary for life, but it’s darn cheap in many places, whereas most of us could live fine without diamonds, but they’re worth a lot. Adam Smith (falsely) concluded that this was because market price was only a function of how much labor it cost to get a good.
I discuss these issues more in the episode over some rousing techno beats, but I want to share here the suspenseful sonic showdown between the prices of industrial diamonds and New York City water! Which will be higher, 1980 to 2009?
First, the countdown to the match. Here are the audio axes–each octave represents another 3 powers of 10 on a logarithmic scale, so this is one, a thousand, a million, and a billion dollars:
Now FIGHT!
The short pokey sounds are the diamonds, and the long washy sounds are the water. This sonification displays the wildly shocking fact that diamonds are way more expensive than water (even though diamonds have gotten relatively much cheaper over the time period). In 2009, a metric ton of water was 80 cents, while the same weight in industrial diamond was $1.2 million! Who knew Copernicus was actually right about something?
How do I make the sounds? Hopefully over time with less and less code, and thankfully much less than this craziness. playitbyr uses ggplot2-inspired syntax to sonify data and is available on CRAN, but we’re in the midst of a major overhaul of the package targeting CRAN in mid-March. I’ll be working from the development version on github in this post but I’ll update the code once we release–creating the podcast helped me discover some functionality that really should be in the package, and I also want to match the function names closer to Hadley Wickham’s style guide since I’m mimicking his interface.
Setting up
The dev version of playitbyr requires the csound and audio packages, and the devtools package is handy for installing packages on github:
install_packages(c("csound", "audio", "devtools")) |
In order to make the csound package actually work, you need to install the cross-platform open-source synthesizer Csound, which we’ll be using to generate the sounds. After you’ve done this, you can use install_github()
to install playitbyr:
require(devtools) install_github("playitbyr", "statisfactions") require(playitbyr) |
From value to sound
Now to get the data. I got the NYC water rates from the NYC water board and the diamond rates from US Geological Survey with some non-reproducible scraping methods, but I’ve thrown them up here as CSV files you can grab. Here’s how to download and prep the data:
water <- read.csv("http://statisfactions.com/wp-content/uploads/2012/02/nycwater.csv") diamonds <- read.csv("http://statisfactions.com/wp-content/uploads/2012/02/ngsdiamond.csv") water <- water[nrow(water):4,] # reverse order and cut off unused years rownames(water) <- NULL ## Water rates are per 748 gallons; convert to metric tons water$rate <- water$rate/2.83148801 ## Two values for 1990; average 'em and remove one water$rate[water$year == 1990] <- mean(water$rate[water$year == 1990]) water <- water[-(which(water$year == 1990)[1]),] paradox <- cbind(water, diamonds$value) names(paradox) <- c("year", "water", "diamonds") |
Now we can see playitbyr’s ggplot2 influence in action when we create the sonification:
sonify(paradox, sonaes(time = year)) + scale_pitch_exp(min = 6, max = 11, dmin = .001, dmax = 1e+12) + scale_time_linear(min = 0, max = 3) + shape_scatter(mapping = sonaes(pitch = water, dur = 1.2, attkp = 0.5, decayp = 0.5, indx = 3, pan = 0, vol = 0.4)) + shape_scatter(mapping = sonaes(pitch = diamonds, dur = 0.3, indx = 10, pan = 1, vol = 0.2)) |
sonaes()
is just playitbyr’s version of ggplot2’s aes()
function; and shape_scatter()
is the sonic equivalent of ggplot2’s geom_point()
to create an audio scatterplot. I map data column "year"
to sound parameter "time"
, and in the layers I map data columns "water"
and "diamonds"
to sound parameter "pitch"
. The rest of the parameters you hear are aesthetic tweaks to the sound to differentiate the two; shape_scatter()
is based on the simple FM instrument that comes with the csound package, which you can read more about in ?scoreMatrices
.
Although I didn’t use R to create the rest of the music and sound for this podcast episode, it’s all done on free and open-source software! I am incredible grateful for Rosegarden and Audacity for enabling me to do sequencing, recording, and sound editing on Linux.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.