Site icon R-bloggers

Explore smartphone market share with Nanocubes

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Back in May, Twitter's Miguel Rios created some beautiful data visualizations to show that with enough (i.e. billions) of geotagged tweets, you can reveal the geography of where people live, work and commute. Now, a new interactive visualization of 210 million geotagged tweets by AT&T Research Labs reveals the market share of iPhone, Android and Windows smartphones down to the smallest geographic levels. 

Simon Urbanek (known to R users not least as a member of the R Core Group) explained in his Web-Based Interactive Graphics talk at JSM 2013 today that the visualization uses 32Tb of Twitter data, yet runs smoothly and interactively on a single machine with 16Gb of RAM. When you first start the application, it shows a view of the population density of the USA, as you might expect from millions of geotagged tweets. But this application also lets you explore the mobile device used to send each tweet. Across the USA (and excluding devices with a device type of "none" or "iPad", which was hardly used), the proportion of tweets sent using each device was:

  • 60.7% of 150.6M tweets were sent using an iPhone,
  • 36.4% with an Android phone,
  • and 2.8% with a Windows phone.

More interestingly, you can use the app to zoom down into specific geographic areas and see the device market share just for that region. In the NYC area, the iPhone dominates, with 74.1% of tweets from known smartphones:

However, in the Atlanta region, Android phones fare much better than in the USA as a whole, being the source of 45.9% of smartphone tweets:

Windows phones appear be best represented in the Seattle-Tacoma region, home to Microsoft HQ:

Despite the massive number of data points and the beauty and complexity of the real-time data visualization, it runs impressively quickly. The underlying data structure is based on Nanocubes, a fast datastructure for in-memory data cubes. The basic idea is that nanocubes aggregate data hierarchically, so that as you zoom in and out of the interactive application, one pixel on the screen is mapped to just one data point, aggregated from the many that sit "behind" that pixel. Learn more about nanocubes, and try out the application yourself (modern browser required) at the link below.

Nanocubes: Fast Visualization of Large Spatiotemporal Datasets

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.