Site icon R-bloggers

July(ish) Update

[This article was first published on HighlandR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A belated review of things I ‘ve been up to but not blogged about until now –

So, merely days (ok, weeks) after I suggest that I never have problems coming up with snazzy blog post titles, here I am with what has to be the most uninspired post title ever.

And yet, it will have to do, as this is a sneaky way of getting a post in before the month of July passes me by.

In terms of R this month I have been mostly :

Basically this should just be taking a group of functions that work well in interactive dplyr, and creating one super function to run them in order. Somewhere along the line it’s become more complex, but I will get there in the end.

In the last few weeks or so I’ve had a look at seplyr – which aims to reduce the complexity of tidy evaluation. Of course, this is very much opinion based – I’ve seen a couple of posts recently questioning the need for the package. My opinion, as a lone R user, is that this package will be beneficial to those who :

Unless the dplyr gurus have any plans to rock up to Inverness any time soon (please do, the weather’s hit and miss but we’re a friendly bunch), I am reliant on R wizards on the internet, or my own brain power, to work out what I need to do. There are a few more posts starting to seep through, so hopefully will be able to suss out some of the more complicated stuff.

It’s quite hard to pick this stuff up when constantly switching between tools. As an example, I had to update a report on Monday morning. This involved:

Then repeat for another data set.

Now on a good day, with a cup of coffee and a headwind, I can get this done in 10 mins or so, but it’s taken a while to get this slick and a lot of that was down to switching costs of tackling each problem with the relevant tool. Anything that makes switching between different tools easier / reduces cognitive burden is good thing, even if it only turns out to be a temporary crutch to free up time to master the deeper theories. So, I approve of seplyr. At the end of the day, its getting stuff done that counts – right?

data.table

Wow this is fast! It also dramatically reduces the amount of code required. I’ve shied away from it in the past because it looked a bit incomprehensible IMHO, but I know now that if you can immerse yourself in it for a few days it does all come together. As someone who uses SQL, I annoyingly still didn’t quite get the whole “ i,j, by” implementation, at least not initially, but it does make sense as you get further into it. I was testing on 2.4M rows of data and it did what I wanted in 0.4 secs (using the microbenchmark package). Dplyr was typically clocking in at a not too shabby 2.5 secs, and actual SQL was so slow I couldn’t justify running the code more than once as a comparison.

I am struggling with one operation, where i just want “ i, by” and no ‘j’ – have seen a couple of posts on the of all knowledge and wizardry so just need to get some clear head space and and get to work. As a last word on data.table, I found this little beauty courtesy of Steph Locke to be super helpful in getting me started. If you are thinking about using data.table or just ‘DT curious’, then you should take a look at that guide.

Away from R, I’ve been:

Sorry about that.

Also, on a final note, this tribute to classic British comedian Eric Morecambe appeared to completely go over the heads of the rstats twitterati. Or maybe it just wasn’t funny? Anyway – here is the link to the inspiration, so if you’ve not seen it, go treat yourself. Its quite long, the real fun starts at 7 minutes in, and the actual gif from the tweet at around 10:55.

Until next time..

To leave a comment for the author, please follow the link and comment on their blog: HighlandR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.