O’Reilly at OSBC: The future’s in the data
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Tim O’Reilly’s keynote talk at OSBC this evening was thought-provoking to say the least. The title of the talk was “The Real Open Source Opportunity”, and the surprise for me was that he wasn’t talking about Open Source software. Tim’s insight, and it’s a profound one, is that the next frontier for freedom and openness — and indeed, the way we’ll live our lives — lies with data.
Why? The world in which open source software was born is very different from the world we’re heading to. Less than a decade ago, a major concern of the computing world was that much of the capability and innovation was locked up in closed software held by major corporations, like Microsoft. Open-source software addressed that. But look where innovation lies today: companies like Google (and few others) –built on the backs of open-source technology, mind you — can now perform tasks that not long ago were the realm of science fiction. Today, you can speak a question into a tiny handheld gadget, and find out where to get good pizza. But think for a moment how Google can do this reliably and quickly: it’s their data. They’ve amassed a massive, proprietary database of search queries, written text, and voice samples that allow the Google Voice Search app on the iPhone (and algorithms in Google’s cloud servers) to distinguish “pizza” (said in on a noisy street in a Jersey accent) from “piece of” or the city “Pisa”. Tim was careful to point out: it’s not the closed algorithms that make this work. Peter Norvig from Google has said it himself: Google doesn’t have better algorithms than everyone else. They just have more data.
Tim asked a question to the audience: “Could anyone in the Open Source community build the infrastructure to deliver Google Voice Search?” The response: a stony silence. The implication? Vendor lock-in is lo longer about proprietary source code. It’s about massive, hard-to-replicate data sets — making Google a potential Microsoft of the next decade. The corollary? The future will be about who has the most data, and who is able to extract meaning from it and deliver it in real time.
So how can we avoid data lock-in in the years to come? Tim suggests that it may be the underdogs of these new cloud-based tools that become the allies of open source data applications. Ironically, it may be Microsoft, lagging today in the domains of search, maps, and speech recognition, that may be the biggest ally in making the associated data services available openly. Google certainly has no motivation to do so; on the contrary, in areas like local search where they once linked to third-parties like Yelp, they’re now providing their own data exclusively. Another opening likely comes from open standards for data sharing, like the Gov2.0 initiative.
The implications are profound, not just in terms of lock-in but also in the areas of privacy. (Interesting privacy implication: did you know that it’s possible to identify a specific appliance, like a Kenwood dishwasher, from the “DNA” of its power draw signal? Consider that when your power company tracks your power usage with e-meters.) But when the operating system of the future is the entire internet, which license you use for open source software somehow seems like small beans compared to the open data issue.
Update: The slides from Tim O’Reilly’s keynote are now available: Open Source in the Cloud Computing Era
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.