Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
By Paulin Shek
Despite being only a day conference, a large range of topics were covered over many sectors, with a very long list of impressive speakers.
This event was a great opportunity to see what technologies and architectures work for companies such as Betfair and Excelian. The questions asked were intelligent and thought-provoking and gave me even more insight on the technologies that other companies are looking into or using.
Other talks had a different focus: on techniques, methods and processes. Martin Trencseni from Prezi gave a good overview into A/B testing and what not to do, ending his talk with book recommendations and further reading! Martin Goodson (Skimlinks) and Emma Martin (ESI Media) gave an engaging talk, describing the algorithms used to decide the adverts a specific person may be interested in, given their past purchases, browsing history etc.
There were also a series breakout sessions, the highlight of which was one presentation by TfL by Roland Major, about the TfL hackathon which was held jointly, a few weeks ago, with Data Science London. Attendees of the hackathon were given access to TfL’s data, and the aim of the event was to gleam more insight from the data. I was delighted to hear that one of the main outcomes from this event is that TfL are keen to share more data and get involved with the data science community. This echoed the sentiments by both Jim Anning’s and Ioana Hrenincuic’s keynote about making more data open and accessible.
The conference ended with a “How to be an Effective Data Scientist” workshop. This was a panel discussion, between 3 presentations by the panel members. The main topic was about characterising a “Data Scientist” into a set of skills. Marc Warner presented a “Data Science pyramid”. (See Mango Solutions’ Data Science Radar for our perspective on this topic!) An attribute that stood out for me on the pyramid is “curiosity”. I completely agreed with the importance of having curious Data Scientists, and I think that indeed the main driver of most of data scientists that I know is curiosity. In the same discussion, Just Giving’s Mike Bugembe provided a great example of this: “I wrote an algorithm to predict what mood my wife was in”.
Another point that was made both in the workshop and in Jim Anning’s keynote was the difference between a “Data Scientist” and “Data Engineer”. This was something that I had not considered before. It made me appreciate the importance of both roles and think about the different personality types needed for each. It mirrors the two categories that the technical talks can be split into. A “Data Engineer” may be more into the big data technology talks, to learn about the deployment of Big Data technologies and micro-services for example. However, a “Data Scientist” may be more interested in the talks about experiment design, models and statistics.
To summarise, there were many themes that came up again and again over the day. The most interesting points that I took away from the conference are the following:
1. The idea of the separation between the concepts “Data Scientist” and “Data Engineer”.
2. There are a lot of technologies available for all aspects of data science, from mining and storing to visualization and processing.
3. Open data is a great and lots of good can come from it!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.