Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It always strikes me as curious that some posts get a lot of love on Twitter, while others get many more shares on Facebook:
What accounts for this difference? Some of it is surely site-dependent: maybe one blogger has a Facebook page but not a Twitter account, while another has these roles reversed. But even on sites maintained by a single author, tweet-to-likes ratios can vary widely from post to post.
So what kinds of articles tend to be more popular on Twitter, and which spread more easily on Facebook? To take a stab at an answer, I scraped data from a couple of websites over the weekend.
tl;dr Twitter is still for the techies: articles where the number of tweets greatly outnumber FB likes tend to revolve around software companies and programming. Facebook, on the other hand, appeals to everyone else: not only to the masses, but to non-software technical folks in general as well.
FlowingData
The first site I looked at was Nathan Yau’s popular FlowingData website on data visualization. To see which articles are more popular on Facebook and which are more popular on Twitter, let’s sort all the FlowingData articles by their # tweets / # likes ratio.
Here are the 10 posts with the lowest tweets-to-likes ratio (i.e., the posts that were especially popular with Facebook users):
- What your state is the worst at – United States of shame
- Plush statistical distribution pillows
- Are gas prices really that high?
- Hey Jude flowchart
- Women’s dress sizes demystified
- America is not the best at everything
- What you need to get together
- Dexter’s victims through season five
- Correlating dog
- Valentine’s Day importance
And here are the 10 posts with the highest tweets-to-like ratio (i.e., the posts especially popular with Twitter users):
- Delicious mass exodus
- Pew Research raw survey data now available
- Stock market predictions with Twitter
- Growth and usage of foursquare in 2010
- Sunlight Labs opens up Real Time Congress API
- Open-source Data Science Toolkit
- Explore your LinkedIn network visually with InMaps
- Perceived vs. actual country rankings
- See what you and others tweet about with the Topic Explorer
- History of detainees at Guantánamo
Notice any differences between the two?
- Instant gratification infographics, cuteness, comics, and pop culture get liked on Facebook.
- APIs, datasets, visualizations related to techie sites (Delicious, foursquare, Twitter, LinkedIn), and picture-less articles get tweeted instead.
- (Interestingly, it looks like the colors in the Facebook articles tend to the red end of the spectrum, while the colors in the Twitter articles tend to the blue end of the spectrum, though it’s hard to say without looking at more data.)
In fact, if we compare articles with pictures to those without, we see that 32% of articles with at least one picture have more shares on Facebook than on Twitter, compared to only 4% of articles with no picture at all.
We can also break down the percentage of articles with more Facebook shares by category.
(I filtered the categories so that each category in the plot above contains at least 5 articles.)
What do we find?
- Articles in the Software, Online Applications, News, and Data sources categories (yawn) get 100% of their shares from Twitter.
- Articles tagged with Data Underload (which seems to contain short and sweet visualizations of everyday things), Miscellaneous (which contains lots of comics or comic-like visualizations), and Infographics get the most shares on Facebook.
- This category breakdown matches precisely what we saw in the top 10 examples above.
New Scientist
When looking at FlowingData, we saw that compared to Facebook users, Twitter users are much bigger on sharing technical articles. But is this true for technical articles in general, or only for programming-related posts? (In my experience with Twitter, I haven’t seen many people from math and the non-computer sciences.)
To answer, I took articles from the Physics & Math and Technology sections of New Scientist, and
- Calculated the percentage of shares each article received on Twitter (i.e., # tweets / (# tweets + # likes)).
- Grouped articles by their number of tweets rounded to the nearest multiple of 25 (bin #1 contains articles close to 25 tweets, bin #2 contains articles close to 50 tweets, etc.).
- Calculated the median percentage of shares on Twitter for each bin.
Here’s a graph of the result:
Notice that:
- The technology articles get consistently more shares from Twitter than the physics and math articles do.
- Twitter accounts for the majority of the technology shares.
- Facebook accounts for the majority of the physics and math shares.
So this suggests that Twitter really is for computer technology in particular, not technical matters in general (though it would be nice to look at areas other than physics and math as well).
XKCD
Finally, let’s take a look at which XKCD comics are especially popular on Facebook vs. Twitter.
Here are the 10 comics with the highest likes-to-tweets ratio (i.e., the comics especially popular on Facebook):
- Dental Nerve
- Wisdom Teeth
- Trapped
- Complex Conjugate
- Learning to Cook
- Serious
- Explorers
- Mu
- Los Alamos
- Magic School Bus
Here are the 10 comics with the highest tweets-to-likes ratio (i.e., the comics especially popular on Twitter):
- Server Attention Span
- Illness
- Nanobots
- Manual Override
- The Cloud
- Constructive
- Future Timeline
- Good Code
- Golden Hammer
- Advertising Discovery
- Online Communities 2
Note that the XKCD comics popular on Facebook have more of a layman flavor, while the XKCD comics popular on Twitter are much more programming-related:
- Of the XKCD comics popular on Twitter, one’s about server attention spans, another’s about IPv6 addresses, a third is about GNU info pages, another deals with cloud computing, a fifth talks about Java, and the last is about a bunch of techie sites. (This is just like what we saw with the FlowingData visualizations.)
- Facebook, on the other hand, gets Ke$ha and Magic School Bus.
- And while both top 10′s contain a flowchart, the one popular on FB is about cooking, while the one popular on Twitter is about code!
- What’s more, if we look at the few technical-ish comics that are more popular on Facebook (the complex conjugate, mu, and Los Alamos comics), we see that they’re about physics and math, not programming (which matches our findings from the New Scientist articles).
What’s Next?
The three websites I looked at are all fairly tech-oriented, so it would be nice to gather data from other kinds of websites as well.
And now that we have an idea how Twitter and Facebook compare, the next burning question is surely what do people share on Google+?!
Addendum
Let’s consider the following thought experiment. Suppose you come across the most unpopular article ever written. What will its FB vs. Twitter shares look like? Although no real person will ever share this article, I think Twitter has many more spambots (who tweet out any and every link) than FB does, so maybe unpopular articles will have more tweets than likes by default. Conversely, suppose you come across the most popular article ever written, which everybody wants to share. Then since FB has many more users than Twitter does, maybe popular articles will tend to have more likes than tweets anyways.
Thus, in order to find out which types of articles are especially popular on FB vs. Twitter, instead of looking at tweets-to-likes ratios directly, we could try to remove this baseline popularity effect. (Taking ratios instead of raw number of tweets or raw number of likes is one kind of normalization; this is another.)
So does this scenario (or something similar to it) actually play out in practice?
Here I’ve plotted the overall popularity of a post (the total number of shares it received on either Twitter or FB) against the percentage of shares on Facebook alone, and we can see that as a post’s popularity grows, more and more shares do indeed tend to come from Facebook rather than Twitter.
Also, see the posts at the lower end of the popularity scale that are only getting shares on Twitter? Let’s take a look at the five most unpopular of these:
- Flowing Data is brought to you by… (March 2011 edition) (11 tweets, 0 likes)
- Flowing Data is brought to you by… (July 2011 edition) (14 tweets, 0 likes)
- Flowing Data is brought to you by… (June 2011 edition) (17 tweets, 0 likes)
- Flowing Data is brought to you by… (May 2011 edition) (18 tweets, 0 likes)
- Flowing Data is brought to you by… (May 2011 edition) (12 tweets, 1 like)
Notice that they’re all shoutouts to FlowingData’s sponsors! There’s pretty much no reason any real person would share these on Twitter or Facebook, and indeed, checking Twitter to see who actually tweeted out these links, we see that the tweeters are bots:
- https://twitter.com/#!/myVisualization/status/77685824224894976
- https://twitter.com/#!/InfographicTwts/status/6766861514245734
- https://twitter.com/#!/guysgoogle/status/77644902510493696
- https://twitter.com/#!/WhereIsYourData/status/77631743292735488
Now let’s switch to a slightly different view of the above scenario, where I plot number of tweets against number of likes:
We see that as popularity on Twitter increases, so too does popularity on Facebook — but at a slightly faster rate. (The form of the blue line plotted is roughly
So instead of looking at the ratios above, to figure out which articles are popular on FB vs. Twitter, we could look at the residuals of the above plot. Posts with large positive residuals would be posts that are especially popular on FB, and posts with negative residuals would be posts that are especially popular on Twitter.
In practice, however, there wasn’t much difference between looking at residuals vs. ratios directly when using the datasets I had, so to keep things simple in the main discussion above, I stuck to ratios alone. Still, it’s another option which might be useful when looking at different questions or different sources of data, so just for completeness, here’s what the FlowingData results look like if we use residuals instead.
The 10 articles with the highest residuals (i.e., the articles most popular on Facebook):
- What you need to get together
- Valentine’s Day importance
- What your state is the worst at – United States of shame
- Plush statistical distribution pillows
- Hitler learns topology
- Dexter’s victims through season five
- Access to education where you live
- Watching the growth of Costco warehouses
- Are gas prices really that high?
- Flight safety-esque beer pong guide
The 10 articles with the lowest residuals (i.e., the articles most popular on Twitter):
- Pew Research raw survey data now available
- Explore your LinkedIn network visually with InMaps
- Stock market predictions with Twitter
- Delicious mass exodus
- Open-source Data Science Toolkit
- Business intelligence vs. infotainment
- See what you and others tweet about with the Topic Explorer
- Growth and usage of foursquare in 2010
- Flash vs. HTML5
- Gender and time comparisons on Twitter
Here’s a density plot of article residuals, split by whether the article has a visualization or not (residuals of picture-free articles are clearly shifted towards the negative end):
Here are the mean residuals per category (again, we see that the miscellaneous, data underload, data art, and infographics categories tend to be more popular on Facebook, while the data sources, software, online applications, and news categories tend to be more popular on Twitter):
And that’s it! In the spirit of these findings, I hope this article gets liked a little and tweeted lots and lots.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.