Articles by Andrew Collier

flipsideR: Support for ASX Option Chain Data

February 8, 2016 | Andrew Collier

I previously wrote about some ad hoc R code for downloading Option Chain data from Google Finance. I finally wrapped it up into a package called flipsideR, which is now available via GitHub. Since I last wrote on this topic I've also added support for downloading option data from the ... [Read more...]

Kaggle: Santa’s Stolen Sleigh

January 22, 2016 | Andrew Collier

This morning I read Wendy Kan's interesting post on Creating Santa's Stolen Sleigh. I hadn't really thought too much about the process of constructing an optimisation competition, but Wendy gave some interesting insights on the considerations involved in designing a competition which was both fun and challenging but still computationally ... [Read more...]

Casting a Wide (and Sparse) Matrix in R

January 19, 2016 | Andrew Collier

I routinely use melt() and cast() from the reshape2 package as part of my data munging workflow. Recently I've noticed that the data frames I've been casting are often extremely sparse. Stashing these in a dense data structure just feels wasteful. And the dismal drone of page thrashing is unpleasant. ... [Read more...]

Kaggle: Walmart Trip Type Classification

January 15, 2016 | Andrew Collier

Walmart Trip Type Classification was my first real foray into the world of Kaggle and I'm hooked. I previously dabbled in What's Cooking but that was as part of a team and the team didn't work out particularly well. As a learning experience the competition was second to none. My ... [Read more...]

Review: Learning Shiny

January 5, 2016 | Andrew Collier

I was asked to review Learning Shiny (Hernán G. Resnizky, Packt Publishing, 2015). I found the book to be useful, motivating and generally easy to read. I'd already spent some time dabbling with Shiny, but the book helped me graduate from paddling in the shallows to wading out into the ... [Read more...]

Making Sense of Logarithmic Loss

December 14, 2015 | Andrew Collier

Logarithmic Loss, or simply Log Loss, is a classification loss function often used as an evaluation metric in kaggle competitions. Since success in these competitions hinges on effectively minimising the Log Loss, it makes sense to have some understanding of how this metric is calculated and how it should be ... [Read more...]

Installing XGBoost on Ubuntu

December 9, 2015 | Andrew Collier

XGBoost is the flavour of the moment for serious competitors on kaggle. It was developed by Tianqi Chen and provides a particularly efficient implementation of the Gradient Boosting algorithm. Although there is a CLI implementation of XGBoost you'll probably be more interested in using it from either R or Python. ... [Read more...]

Graph from Sparse Adjacency Matrix

November 12, 2015 | Andrew Collier

I spent a decent chunk of my morning trying to figure out how to construct a sparse adjacency matrix for use with graph.adjacency(). I'd have thought that this would be rather straight forward, but I tripped over a few subtle issues with the Matrix package. My biggest problem (which ... [Read more...]

LIBOR and Bond Yields

November 6, 2015 | Andrew Collier

I've just been looking at the historical relationship between the London Interbank Offered Rate (LIBOR) and government bond yields. LIBOR data can be found at Quandl and comes in CSV format, so it's pretty simple to digest. The bond data can be sourced from the US Department of the Treasury. ... [Read more...]

Review: Beautiful Data

October 15, 2015 | Andrew Collier

I've just finished reading Beautiful Data (published by O'Reilly in 2009), a collection of essays edited by Toby Segaran and Jeff Hammerbacher. The 20 essays from 39 contributors address a diverse array of topics relating to data and how it's collected, analysed and interpreted. Since this is a collection of essays, the writing ... [Read more...]

#MonthOfJulia Day 17: Datasets from R

September 17, 2015 | Andrew Collier

R has an extensive range of builtin datasets, which are useful for experimenting with the language. The RDatasets package makes many of these available within Julia. We'll see another way of accessing R's datasets in a couple of days' time too. In the meantime though, check out the documentation for ... [Read more...]

urlshorteneR: A package for shortening URLs

September 14, 2015 | Andrew Collier

This is a small package I put together quickly to satisfy an immediate need: generating abbreviated URLs in R. As it happens I require this functionality in a couple of projects, so it made sense to have a package to handle the details. It's not perfect but it does the ... [Read more...]

Constructing a Word Cloud for ICML 2015

July 10, 2015 | Andrew Collier

Word clouds have become a bit cliché, but I still think that they have a place in giving a high level overview of the content of a corpus. Here are the steps I took in putting together the word cloud for the International Conference on Machine Learning (2015). Extract the hyperlinks ... [Read more...]

Review: Machine Learning with R Cookbook

July 3, 2015 | Andrew Collier

"Machine Learning with R Cookbook" by Chiu Yu-Wei is nothing more or less than it purports to be: a collection of 110 recipes for applying Data Analysis and Machine Learning techniques in R. I was asked by the publishers to review this book and found it to be an interesting and ... [Read more...]

Amazon EC2: Upgrading R

June 19, 2015 | Andrew Collier

After installing R and Shiny on my EC2 instance I discovered that the default version of R was a little dated and I wanted to update to R 3.2.0. It's not terribly complicated, but here are the steps I took. First, become root. Remove the old version of R. Edit /etc/... [Read more...]

R Recipe: RStudio and UNC Paths

June 4, 2015 | Andrew Collier

RStudio does not like Uniform Naming Convention (UNC) paths. This can be a problem if, for example, you install it under Citrix. The solution is to create a suitable environment file. This is what worked for me: I created an .Renviron file in my Documents folder on the Citrix remote ... [Read more...]
1 2 3

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)