Evaluating Quandl Data Quality – part II

Posted on December 2, 2013 by The R Trader in R bloggers | 0 Comments

[This article was first published on The R Trader » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post is a more in depth analysis of Quandl futures data vs. Bloomberg data. Since my last post Quandl has updated its futures database to 200+ contracts from 68 contracts originally. For practical reasons, I limit myself here to the initial list of 60+ contracts. I’m still comparing the “Front Month” contract between the two sources. When evaluating the differences, I want the following:

Evaluate the scale of the differences
Evaluate the time localization of the differences (if any)
A single number that captures both features above
A measure that is comparable across instruments

After a bit of thinking, I came up with the below metric:

$D_t = {P(Quandl)_t - P(Bloomberg)_t }/ {Tick Size}$

As an example, below is the chart of the above formula over time for the E-mini S&P 500 contract.

I plotted the same chart for each of the 60 contracts in the list of my previous post. Interested readers can find all the charts here.

From my perspective there are essentially two main sources of differences. First, plain wrong data points largely off compared to the reality and second a difference in the data building process (i.e. construction methodology for the front month contract). A mix of both is very likely to happen here. In order to quantify this, I defined one additional metric: Mean Absolute Differences (MAD).

0″ title=”MAD=sum{t=1}{n}{Abs(D_t)}/n for D_t <> 0″/>

Instrument	Quandl Symbol	Bloomberg Ticker	MAD
Soybean Oil	OFDP/FUTURE_BO1	BO1 Comdty	12254897
Russian Ruble	OFDP/FUTURE_RU1	RU1 Curncy	29653
DJ-UBS Commodity Index	OFDP/FUTURE_AW1	DNA Index	3041
S&P500 Volatility Index	OFDP/FUTURE_VX1	UX1 Index	2453
Cocoa	OFDP/FUTURE_CC1	CC1 Comdty	1552
Lean Hogs	OFDP/FUTURE_LN1	LH1 Comdty	391

Ranking the 60+ contracts on MAD allows to identify immediately large differences which are: Soybean Oil, Russian Ruble, DJ-UBS Commodity Index, S&P500 Volatility Index, Cocoa, and Lean Hogs. Those are the obvious candidates for immediate checking.

I put together what I think is the basis for a systematic data checking approach. It can obviously be refined in many ways but those refinements are largely dependent upon what one want to do with the data and which contracts are relevant to the analyst. As an example I assume that it is more relevant for most people to have accurate data for the E-mini S&P 500 contract than for the Milk contract.

As usual any comments welcome

To leave a comment for the author, please follow the link and comment on their blog: The R Trader » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Evaluating Quandl Data Quality – part II

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)