Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Todd Schneider's blog post on solving the traveling salesman problem with R hit the front page of reddit.com. This is a big deal: front-page placement on the popular social news site can drive a ton of traffic (in Todd's case, 1.3 million pageviews). But what factors determine which of reddit's contributed links make it to the front page? (There are 25 front-page slots, but more than 100,000 reddit posts on an average day.)
Todd set out to answer this question using the statistical language R, and reported his results on Mashable. He collected 6 weeks of data including 1.2 million rankings for about 15,000 posts, and looked for commonalities amongst those posts that made the top 25.
Now, you might expect that a post's front page ranking is determined by its score (the number of times it has been "liked" by a reddit user, most likely after having seen it in the "subreddit" special topic area where it was posted), and how long since it was posted (reddit's front page generally contains recent posts). But it turns out that not all subreddits are treated equally. Todd discovered that there are three different types of subreddits when it comes to how posts are promoted to the front page:
- "Viral Candy" subreddits like funny, gifs and todayilearned. Posts from this category dominate page one.
- "Page Two" subreddits", which includes Documentaries, Fitness and personalfinance. As the name suggests, posts in these subreddits almost never make it to page 1, but are often promoted to page 2.
- "The Rest", which includes food, LifeProTips, and sports. Todd's post was in this category, in the subreddit dataisbeautiful. Posts in these subreddits make a small but significant fraction of page 1 posts.
It seems that reddit's front page (and pages 2, 3 and 4 which follow) follow a well-defined mix of posts from each of the three categories, as you can see in the chart below:
Starting from the left of the chart above, you can see the #1 post (on page 1) is from one of the "Viral Candy" subreddits about 97% of the time, but that a "The Rest" post does occasionally make top billing. By contrast, posts from the "Page Two" subreddits almost never appear above #10, but dominate page two (ranks 26-50). There's a pretty consistent mix on pages 3 and 4: about 65% "viral candy", about 15% "page twos" and about 25% "the rest".
As for post scores, Todd noted that posts from "Viral Candy" and "The Rest" subreddits need high scores to get on page 1: about 3500-4500 and 3000-4000 respectively for the top slot. By contrast, posts in "Page 2" reddits only need scores in the 500-1500 range to hit the lower ranks of Page 1 (but are much more likely to appear on Page 2).
If you're interested in the details of what gets a post on reddit's front page, Todd's blog post has lots more information. And if you're an R user and want to do a similar analysis, Todd's data and R code are available on github.
Todd W Schneider: The reddit Front Page is Not a Meritocracy
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.