New qeML Plotting Function

matloff

7 hours ago

[This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve added a new function to qeML 1.2, qeMittalGraph, based on an idea by my student Aditya Mittal. Below is an example that I think is rather compelling.

The basic idea is quite simple (and not necessarily new, just something I had not seen below): Instead of comparing several curves directly, plot their growth from their initial baseline value. So if for example X is time, then all curves start from the common point X = 0, Y = 1. Viewing the curves in this manner may make comparison more insightful.

As an example, we’ll use the currency dataset included in qeML, consisting of data on five European pre-EU currencies.

> data(currency)
> head(currency)
  Can..dollar Ger..mark Fr..franc UK.pound J..yen
1          19       580     4.763       29    602
2          18       609     4.818       44    609
3          20       618     4.806       66    613
4          46       635     4.825       79    607
5          42       631     4.796       77    611
6          45       635     4.818       74    610
curr <- cbind(1:nrow(currency),currency)
names(curr)[1] <- 'weeknum'

OK, let’s graph the raw values:

z <- reshape2::melt(curr,id.vars='weeknum')
qePlotCurves(z,1,3,2)

Now with qeMittalGraph:

qeMittalGraph(curr,'weeknum','rate','country')

We immediately see two clusters, frank/mark/yen and Cdollar/pound, potentially a significant insight. There may be some economic context needed, but clearly this view could be of great interest.

Note that the ‘loess’ smoothing option is the default, which has resulted in one of the curves not passing through (0,1). Setting this option to FALSE would fix this, but at a cost of having jagged curves.

Another class of use cases is graphing the effect of a hyperparameter, say graphing the effect of minimum leaf size X in random forests, over several different datasets, with Y = Mean Absolute Prediction Error.

To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Related