Visual Illusions: Google vs Facebook vs Yahoo
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Recently, Guerrilla alumnus, Scott J. pointed me at this Chart of the Day showing how Google revenue growth was outpacing both Facebook and Yahoo, when compared 7 years after launching the respective companies.
Clearly, this chart is intended to be an attention getter for the Silicon Alley Insider website but, it looks about right and normally I might have just accepted the claim without giving it anymore thought. The notion that Google growth is dominating, is also consistent with a lot of other things one sees. No surprises there.
Exponential doubling period
In this particular case, however, I was struck by the shape of the data and curious to find out if the growth of GOOG and FB revenue follows an exponential trend or not. Exponential growth is not unexpected because it’s the continuous analog of compound interest. If they are growing exponentially, I can compare their doubling periods numerically and determine by how their growth will look in the future.The doubling period is an analysis technique that I use in Chapter 8 of my Guerrilla Capacity Planning book to determine the traffic growth of major websites. In section 8.7.5 the doubling time t2 is defined as:
t2 = Ln(2) / A
where A is the growth parameter of the fitted exponential curve (the rate at which it bends upward) and Ln(2) is the natural logarithm of 2 (2 for doubling). The only fly in the ointment is that I don’t have the actual numeric values used in the histogram chart, but that need not be a showstopper. There are only a half dozen data points for each company, so I can estimate them visually. Then, I can use R to fit the exponential models and calculate the respective doubling times.
Analysis in R
First, we read the data (as eyeballed from the online chart) into R. Since the amount of data is small, I simply use the textConnection trick to write the data in situ, rather than using an external file.gd <- read.table(textConnection("Year GOOG FB\tYAH 1 0.001 0.002 0.001 2 0.01 0.02 0.01 3 0.1 0.2 0.1 4 0.5 0.45 0.3 5 1.5 0.75 0.6 6 3.2 2.0 1.1 7 6.1 4.0 0.75"), header=TRUE,sep="\t") closeAllConnections()I can now plot those estimated data points and compare them with the original chart.
plot(gd$Year,gd$GOOG,type="b",col="green",lwd=2,lty="dashed", main="Annual revenues for GOOG (green), FB (blue), YAH (red)", xlab="Years after launch", ylab="$ billions") points(gd$Year,gd$FB,type="b",col="blue",lwd=2,lty="dashed") points(gd$Year,gd$YAH,type="b",col="red",lwd=2,lty="dashed")The result looks like this: The dashed lines simply connect related points together. The two solid lines are produced by performing the corresponding exponential fits to the GOOG and FB data.
# x-values for continuous exp curves x<-seq(from=1, to=7, by=0.1) ggfit<-nls(gd$GOOG ~ g0*exp(g1*gd$Year),data=gd,start=list(g0=1,g1=1)) gc<-coef(ggfit) lines(x,y=gc[1]*exp(gc[2]*x)) fbfit<-nls(gd$FB ~ f0*exp(f1*gd$Year),data=gd,start=list(f0=1,f1=1)) fc<-coef(fbfit) lines(x,y=fc[1]*exp(fc[2]*x)) # report the doubling periods text(1,5.0,sprintf("%2s doubling time: %4.2f months", names(gd)[2],12*log(2)/gc[2]),adj=c(0,0)) text(1,4.5,sprintf("%2s doubling time: %4.2f months", names(gd)[3],12*log(2)/fc[2]),adj=c(0,0))From the R analysis we see that the doubling period for Google (t2 = 11.39 months) is slightly longer than that for Facebook (t2 = 10.94 months). Despite the banner claim made by Silicon Alley Insider, based on these estimated data, Google is growing revenue at a slightly slower rate than Facebook. How can that be?
Conclusion
In the original histogram chart, it looks like Google is growing faster than Facebook. Well, looks can be deceiving. Your brain can be fooled (easily) by optical illusions. That's why we need to do analysis in the first place. Viewed uncritically, your brain can easily be led astray.To resolve this paradox, let's do two things:
- Project the growth models out further than the 7 years associated with the data
- Plot the projected curves on log-linear axes (for reasons that will become clear shortly)
The left-hand plot shows that the two curves cross somewhere between 7 years out and 40 years out. Whereas green (Google) is currently on top, according to the data, blue (Facebook) eventually ends up on top according to the exponential models; assuming nothing else changes in the future. The right-hand plot uses a log-scaled y-axis to reveal more clearly that the crossover occurs at t = 23.9 years. Once again, if you rely purely on visuals, you might think the crossover doesn't occur until after 30 years (what looks like a "knee" in the left-hand plot), but you'd be misled. It occurs almost 10 years earlier.
If, for example, you were only interested in short-term gains (as Wall St is wont to do), the original visual (histogram) is correct. If, on the other hand, you are in your 20s and investing longer term, e.g., for your retirement, you might get a surprise.
By now, you might be thinking that these projections are not very accurate, and I wouldn't completely disagree with you. But what is accurate here? The original data in the histogram (even the really real actual data) probably aren't very accurate either; we really can't know without deeper investigation. And that's my point: independent of the accuracy of the data, the numerical analysis can cause you to pay attention to, and possibly ask questions about, something you might otherwise have taken for granted on purely visual grounds.
I'm a big fan of data visualization, but not to the exclusion of numerical analysis. We need both and we need both to be easily accessible.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.