Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The latest playitbyr
now offers more options and combinations for data sonification (exploring data through sound), with a ggplot2
-inspired syntax. See the website for examples and how to get started.
The recent Sonification Handbook has a chapter devoted to exploratory data analysis with sonification. With some help from Sam Ferguson, one of the chapter’s authors, I’ve made it easy to implement those techniques using R
. The following are recreations of the chapter’s sound examples, all exploring Edgar Anderson’s iris data.
Auditory dotplot
The auditory dotplot gives a quick univariate view of the measured lengths of iris petals data, by mapping those onto clicks in time. Earlier clicks represent shorter petals.
sonify(data = iris, mapping = sonaes(time = Petal.Length)) + shape_dotplot(jitter = 0.3) + scale_time_continuous(soundlimits = c(0, 15)) |
First, we specify basic aspects of the sonify
, analogous to a ggplot
object: the data is iris, and we’re mapping Petal.Length
onto time. We add on the layer shape_dotplot
, adding in a bit of noise via jitter
to avoid overplotting, and specify that we want the output scaled to the range of 0 to 15 seconds.
Auditory histogram
The auditory histograms are one of the most creative touches in the chapter. They give a sense of the frequency distribution by repeatedly sampling from the data (without replacement), and mapping each sampled value to a pitch. This can be played indefinitely to give a sense of the distributions shape–lots of notes in the middle of the range indicate a heavily central distribution, for instance.
This design can be used effectively in combination with sonfacet
to compare different values of a categorical variable. For instance, we can listen to Sepal.Length
, faceted by the three different iris species (setosa, versicolor, and virginica):
You can hear how each species clusters in a different area and get a sense of how spread-out they are. Here’s the code:
sonify(iris, sonaes(pitch = Sepal.Length)) + sonfacet(Species) + + shape_histogram(length = 3, tempo = 1800) |
We again choose iris
as the data set, and now map Sepal.Width
to time. Then, we facet by Species
; faceting means we simply split the data by levels(Species)
, create the sonification for each level, and then play the sonifications one after another. Finally, we add on shape_histogram
, where the length of time that each sonification plays is 3 seconds and the samples are drawn at a rate of 1800 beats per minute.
Audio boxplot
shape_boxplot
is a similar principle. The same sampling occurs, only now there are three phases: first, the entire range of the data, then only from the interquartile range (the 25th to 75th percentile), and finally just the median. This can help give ideas of both center and spread for a variable. We’ll again look at Sepal.Length
and facet by Species
here:
sonify(iris, sonaes(pitch = Sepal.Length)) + sonfacet(Species) + shape_boxplot(length = 1, tempo = 1800) |
The code is identical, except the length
parameter refers to the length of each of the segments of the boxplot, rather than the whole facet.
< hline>
I hope to incorporate speech, easier-to-set up audio integration, lots more sounds, and other goodies in future versions; you can view and fork the code on its github.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.