Toronto Data Science Group – A Survey of Data Visualization Techniques and Practice

[This article was first published on everyday analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently I spoke at the Toronto Data Science group. The folks at Mozilla were kind enough to record it and put it on Air, so here it is for your viewing pleasure (and critique):


Overall it was quite well received. Aside from the usual omg does my voice really sound like that?? which is to be expected, a couple of thoughts on the business of giving presentations which were quite salient here:

  • Talk slower and enunciate
  • Gesture, but not too much
  • Tailor sizing and colouring of visuals, depending on projection & audience size

I’ve reproduced the code which was used to create the figures made in R (including the bubble chart example, with code and data from FlowingData), which regrettably at the time I neglected to save:

# Toronto Data Science Group Talk plots
# Myles Harrison
# http://www.everydayanalytics.ca/2014/02/toronto-data-science-group-talk.html
library(hexbin)
library(RColorBrewer)
# Create random data
x <- rnorm(5000, mean=1000)
y <- rnorm(5000, mean=2000)
# Scatterplot
plot(x, y, pch=16, col='black')
# Scatter with transparency
plot(x, y, pch=16, col=rgb(0,0,0,0.1))
# Smaller plotting symbols
plot(x, y, pch=16, col='black', cex=0.1)
# Simulate kernel density estimation with Euclidean distance to means
d <- sqrt((x-mean(x))^2 + (y-mean(y))^2)
# Select palette and normalize to fit in color range
palette(rainbow(32))
d <- d/max(d)*32+1
# Plot
plot(x, y, pch=16, col=d)
# Hexbinning
h <- hexbin(x,y)
plot(h)
# With color
plot(h, colramp=BTY)
# DISTRIBUTION
# Regular histogram
hist(x, breaks=125, col='red', xlab='x', main='Histogram of x')
# Boxplot
x2 <- rnorm(1000, mean=1000)
boxplot(x2)
# Add jitter
stripchart(x2,vertical=T,method="jitter",jitter=0.1,add=T, pch=16,
cex=0.1, col=rgb(0,0,0,0.5))
# Bubble chart demo
# Data from Flowing Data: http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/
crime <- read.csv("http://datasets.flowingdata.com/crimeRatesByState2005.tsv", header=TRUE, sep="\t")
radius <- sqrt( crime$population/ pi )
plot(crime$murder, crime$burglary, col='red', pch=16, ylab='Burglary Rate', xlab='Murder Rate', xlim=c(0, 10), ylim=c(150, 1250))
symbols(crime$murder, crime$burglary, circles=radius, inches=0.35, fg="black", bg="red", xlab="Murder Rate", ylab="Burglary Rate", xlim=c(0, 10), ylim=c(150, 1250))
# Category-like colouring
palette(colorRampPalette(c("red", "green", "darkgreen"))(3))
c <- round(runif(50)*2)+1
symbols(crime$murder, crime$burglary, circles=radius, inches=0.35, fg="black", bg=c, xlab="Murder Rate", ylab="Burglary Rate", xlim=c(0, 10), ylim=c(150, 1250))
# Quantitive-like colouring
palette(colorRampPalette(c("lightblue", "yellow", "red"), interpolate='spline')(11))
c <- crime$population/max(crime$population)*10+1
symbols(crime$murder, crime$burglary, circles=radius, inches=0.35, fg="black", bg=c, xlab="Murder Rate", ylab="Burglary Rate", xlim=c(0, 10), ylim=c(150, 1250))
The visuals are also available on Slideshare.

Lessons learned: talk slower, always save your code, and Google stuff before starting – because somebody’s probably already done it before you.

To leave a comment for the author, please follow the link and comment on their blog: everyday analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)