Notes from the Kölner R meeting, 26 June 2015

[This article was first published on mages' blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last Friday the Cologne R user group came together for the 14th time, and for the first time we met at Startplatz, a start-up incubator venue. The venue was excellent, not only did they provide us with a much larger room, but also with the whole infrastructure, including table-football and drinks. Many thanks to Kirill for organising all of this!

Photo: Günter Faes

We had two excellent advanced talks. Both were very informative and well presented.

Data Science at the Command Line

Kirill Pomogajko showed us how he uses various command line tools to pre-process log-files for further analysis with R.
Photo: Günter Faes
Imagine you have several servers that generate large data sets with no standard delimiters, like the example below.

Race Age Military Color Speed Car Month State
Other 24 Navy Aquamarine4 63 "Lotus Europa" June Michigan
White 26 Army Palegreen4 53 "Merc 450SE" Florida
White 26 Army Burlywood3 45 "Porsche 914-2" May Wisconsin
Hispanic 28 Navy Burlywood3 56 "Mazda RX4 Wag" April Florida
White 22 Marine Corps Salmon1 51 "Hornet 4 Drive" California
White 31 Army Lightyellow2 46 "Cadillac Fleetwood" Arizona
Black 23 Air Force Gray27 58 "Datsun 710" October South Carolina
White 22 Navy Violetred4 53 "Toyota Corona" California
White 31 Marine Corps Sienna4 48 "Toyota Corolla" December Oklahoma
White 24 Navy Rosybrown4 59 "Merc 280C" March New Jersey
White 27 Army Rosybrown4 61 "Pontiac Firebird" Florida
Hispanic 20 Army Lightyellow2 37 "Mazda RX4 Wag" July Pennsylvania
Hispanic 32 Navy Lightyellow2 63 "Volvo 142E" Michigan
Hispanic 34 Navy Sienna4 36 "Merc 280C September" Nevada
Hispanic 29 Air Force Aquamarine4 56 "Toyota Corona" Mississippi
White 28 Air Force Lightyellow2 73 "Honda Civic" November West Virginia
Asian 26 Army Aquamarine4 64 "Fiat X1-9" March Missouri
White 23 Army Rosybrown4 53 "Duster 360" May Tennessee
White 28 Marine Corps Palegreen4 52 "Chrysler Imperial" California
view raw MessyData.txt hosted with ❤ by GitHub
The columns appear to be separated by a blank at first glance, but the second column has strings such as “Air Force”. Furthermore, other columns have missing data and another uses speech-marks. Thus, it’s messy and difficult to read into R.

To solve the problem Kirill developed a Makefile that uses tools such as scp, sed and awk to download and clean the server files.

Kirill’s tutorial files are available via GitHub.

An Introduction to RStan and the Stan Modelling Language


Paul Viefers gave an great introduction to Stan and RStan, with a focus on explaining the differences to other MCMC packages such as JAGS.

Photo: Günter Faes

Stan is a probabilistic programming language for Bayesian inference. One of the major challenges in Bayesian analysis is that often there is no analytical solution for the posterior distribution. Hence, the posterior distribution is approximated via simulations, such as Gibbs sampling in JAGS. Stan, on the other hand, uses Hamiltonian Monte Carlo (HMC), an algorithm that is more subtle in proposing jumps, using more structure by translation into Hamiltonian mechanics framework.

Paul ended his talk by walking us through the various building blocks of a Stan script, using a hierarchical logistic regression example.

You can access Paul’s slides on Dropbox.

Drinks and Networking

No Cologne R user group meeting is complete without Kölsch and networking. In the end some of us ended up in a fancy burger place.

Next Kölner R meeting

The next meeting will be scheduled in September. Details will be published on our Meetup site. Thanks again to Revolution Analytics for their sponsorship.

To leave a comment for the author, please follow the link and comment on their blog: mages' blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)