Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I meant to blog about the R/Finance conference during a lull, but I didn’t find too many. Unlike many conferences I’ve been to the structure of R/Finance was simple: one room and one speaker at a time. Relying on each speaker to bring something of value to a pretty diverse audience is risky but it obviates two problems I see with larger conferences. First, you take the tyranny of choice right off the table. There was no chance I could sit in on the “wrong” session and miss out on something valuable. Second, the constraint of a single presenter at any time forces the organizers to be very picky about content and tone. With 12 rooms you could shuffle marginal presentations into a small conference room and be happy that at least in theory your decision was a net positive. Even with lightning talks (talks ranging from 5-7 minutes), every presenter matters.
I want to pick back up on a few threads in later posts, but two presentations really stuck out to me. The first was Robert Gramarcy and multivariate inference with missing data (accompanied by the CRAN package monomvn. The second was David Matteson’s Independent Component Analysis via Distance Covariance. I understood the former better than the latter, but that is not meant to disparage the authors!
Specifically the monomvn package is fascinating to me as a tool to recover full series where only a small subset of the data are missing. For the social sciences imputation can be very tricky because we want to ask whether or not data are missing for a reason. The classic example for labor economists is missing or zero hours worked in the labor force. Because there may be a distinct structural model behind why people enter or exit the labor force versus why they choose to work X hours we have to take great care in recovering any missing data on hours worked–missingness is meaningful. For Gramarcy’s example of a portfolio, an investment manager isn’t necessarily interested in the broader social meaning of a missing data point for a commodity–her priority is to recover a consistent estimate of what the data might be in order to use the series as a whole.
Matteson’s presentation was broader and more challenging. At the core he is interested in distribution free measures of covariance between variables. Rather than use principal component analysis (PCA) to determine the linear dependence of variables on one another he starts with independent component analysis (ICA) in order to capture higher order dependences. He can then compute the distance covariance along the component spaces and recover a measure of correlation that does not depend on many strong assumptions. Neat stuff. I don’t pretend to understand it completely and any errors in explaining it are my own.
It was also a gas to hear John Bollinger talk about his early interactions with computing on the intel 8080–figured I would never hear someone mention that line of CPUs favorably again in my lifetime. All in all a tremendous conference (cheap, too! Thank UIC for that as well as the sponsors) and well worth the trip.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.