Get out of my way! Dunk thru #rstats errors like the Big Shaq-istician
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Ahh, leaves falling, parents crying, collegicians biking uphill with a bag of in-n-out in between their teeth. Must be the new academic school year!
I figured it’s a good time to introduce my work-in-progress datzen package of miscellaneous #rstats functions. You can bee-line straight to the github readme with more examples.
Or stick around and I’ll highlight the Shaq example showcasing datzen::itersave()
In #rstats if you want to iterate, you can go about it in many different ways. Works pretty well for “homogeneous” iterations.
As good as they are, the standard approaches hit snags for “non-homogeneous” iterations, eg data from the web.
Go ahead, try them. I dare you.
You in 5 hours
“Aw shit, my brute force for loop crapped the bed during iteration 69. Now I have to manually restart it. I hope it doesn’t do it again. I’m running out of patience, and linen.”
Let’s take a look. The Big Aristotle, Dr. Shaq, was a notorious brute on the hardwood. Here he is, contemplating how he should score in the paint:
shaq = function(meatbag){ if(meatbag %in% 'scrub'){return('dunk on em')} if(meatbag %in% 'sabonis'){return('elbow his face')} if(!(meatbag %in% c('scrub','sabonis'))){ stop('shaq is confused')} } meatbags = c('scrub','sabonis','scrub','kobe') names(meatbags) = paste0('arg_',seq_along(meatbags)) testthat::expect_failure(lapply(meatbags,FUN=shaq)) #> Error in FUN(X[[i]], ...): shaq is confused
Uh, some error confused Shaq.
enter, stage trap door
“Meet itersave()
”
front row faints
“It’s… hideously beautiful”
In a nutshell, itersave
works like lapply
but when it meets an ugly, unskilled, unqualified, and ungraceful error it will keep trucking along like Shaquille The Diesel O’Neal hitchhiking a ride on Chris Dudley’s back
mainDir=paste0(getwd(),'/tests/proto/') subDir='/temp/' itersave(func_user=shaq, vec_arg_func=meatbags, mainDir,subDir) #> [1] "1 of 4" #> [1] "2017-10-01 12:35:14 PDT" #> [1] "arg_1" #> [1] "2 of 4" #> [1] "2017-10-01 12:35:14 PDT" #> [1] "arg_2" #> [1] "3 of 4" #> [1] "2017-10-01 12:35:14 PDT" #> [1] "arg_3" #> [1] "4 of 4" #> [1] "2017-10-01 12:35:14 PDT" #> [1] "arg_4"
The meatbags that Shaq succesfully put into bodybags.
print('the successes') #> [1] "the successes" list.files(paste0(mainDir,subDir)) #> [1] "arg_1.rds" "arg_2.rds" "arg_3.rds" "failed"
It’ll also book keep any errors along the way via purrr::safely()
and R.utils::withTimeout()
.
print('the failures') #> [1] "the failures" list.files(paste0(mainDir,subDir,'/failed/')) #> [1] "arg_4.rds"
Along with the out, itersave has an in companion
enter, zipline from balcony
“meet iterload()
”
audience faints
iterload(paste0(mainDir,subDir,'/failed')) #> $arg_4 #> $arg_4$ind_fail #> [1] 4 #> #> $arg_4$input_bad #> [1] "kobe" #> #> $arg_4$result_bad #> <simpleError in (function (meatbag) { if (meatbag %in% "scrub") { return("dunk on em") } if (meatbag %in% "sabonis") { return("elbow his face") } if (!(meatbag %in% c("scrub", "sabonis"))) { stop("shaq is confused") }})("kobe"): shaq is confused>
Ah, it was the 4th argument, Kobe, that boggled Shaq’s mind.
“Jigga man [was] Diesel, when he [used to] lift the 8 Up” – Jay-Z
*Wiping away my sad Laker tear from my face while I type this*
“What could have been man, what could have been.”
R.I.P Frank Hamblen
Anyways, Shaq wisened up in Miami. He also fattened up in Phoenix, Cleveland, Boston, Hawaii, Catalina, etc.
shaq_wiser = function(meatbag){ if(meatbag %in% 'scrub'){return('dunk on em')} if(meatbag %in% 'sabonis'){return('elbow his face')} if(meatbag %in% 'kobe'){return('breakup & makeup')} if(!(meatbag %in% c('scrub','sabonis','kobe'))){ stop('shaq is confused')} } itersave(func_user=shaq_wiser, vec_arg_func=meatbags, mainDir,subDir) #> [1] "1 of 4" #> [1] "2017-10-01 12:35:14 PDT" #> [1] "arg_1" #> [1] "2 of 4" #> [1] "2017-10-01 12:35:14 PDT" #> [1] "arg_2" #> [1] "3 of 4" #> [1] "2017-10-01 12:35:14 PDT" #> [1] "arg_3" #> [1] "4 of 4" #> [1] "2017-10-01 12:35:14 PDT" #> [1] "arg_4"
So, give me the whole shebang. What was the whole story of Shaqs road trip?
out_il = iterload(paste0(mainDir,subDir)) cbind(meatbags,out_il) #> meatbags out_il #> arg_1 "scrub" "dunk on em" #> arg_2 "sabonis" "elbow his face" #> arg_3 "scrub" "dunk on em" #> arg_4 "kobe" "breakup & makeup"
So, if you use bare bones for loops or lapply
you’ll crap out immediately when you hit an error.
On the other hand, even using purrr::map
with purrr::safely
, by design, it’ll do everything in one shot (eg batch results). This is not ideal when working with stuff online. When you backtrack to resolve unforseen edge-cases, it’ll feel like a cantor-set .
For web data in the wild, expect the unexpected. That’s why I baked up itersave
. You have non-homogeneous edge cases aplenty.
These Chris Dudley looking edge cases are just waiting in the bushes for you.
Dunk thru them.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.