Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
What follows is like a kata to strengthen your R
fundamentals.
The lovely stats in the wild recently posted some hott data analysis of Olympians’ ages and sexes. Because I’m annoyingly picky about graphics, I asked for his code so I could tweak the graphics according to my own perfidious norms. Stats in the wild posted his scraper of sports-reference.com — I’m sure you can find some more interesting uses for it — and asked for (polite) suggestions for improvement.
One potential place for improvement in stats in the wild’s code could answer two questions for R
learners more generally so I’m sharing the code block.
alphabet<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z") for (i.one in 1:26){ for (i.two in 1:26){ letters<-paste(alphabet[i.one],alphabet[i.two],sep="") } }
The desired goal is to get pairs of letters like
[1] aa ba ca da ea fa ga ha ia ja ka la ma na oa [16] pa qa ra sa ta ua va wa xa ya za ab bb cb db [31] eb fb gb hb ib jb kb lb mb nb ob pb qb rb sb [46] tb ub vb wb xb yb zb ac bc cc dc ec fc gc hc [61] ic jc kc lc mc nc oc pc qc rc sc tc uc vc wc [76] xc yc zc ad bd cd dd ed fd gd hd id jd kd ld [91] md nd od pd qd rd sd td ud vd wd xd yd zd ae [106] be ce de ee fe ge he ie je ke le me ne oe pe [121] qe re se te ue ve we xe ye ze af bf cf df ef [136] ff gf hf if jf kf lf mf nf of pf qf rf sf tf [151] uf vf wf xf yf zf ag bg cg dg eg fg gg hg ig [166] jg kg lg mg ng og pg qg rg sg tg ug vg wg xg [181] yg zg ah bh ch dh eh fh gh hh ih jh kh lh mh [196] nh oh ph qh rh sh th uh vh wh xh yh zh ai bi [211] ci di ei fi gi hi ii ji ki li mi ni oi pi qi [226] ri si ti ui vi wi xi yi zi aj bj cj dj ej fj [241] gj hj ij jj kj lj mj nj oj pj qj rj sj tj uj [256] vj wj xj yj zj ak bk ck dk ek fk gk hk ik jk [271] kk lk mk nk ok pk qk rk sk tk uk vk wk xk yk [286] zk al bl cl dl el fl gl hl il jl kl ll ml nl [301] ol pl ql rl sl tl ul vl wl xl yl zl am bm cm [316] dm em fm gm hm im jm km lm mm nm om pm qm rm [331] sm tm um vm wm xm ym zm an bn cn dn en fn gn [346] hn in jn kn ln mn nn on pn qn rn sn tn un vn [361] wn xn yn zn ao bo co do eo fo go ho io jo ko [376] lo mo no oo po qo ro so to uo vo wo xo yo zo [391] ap bp cp dp ep fp gp hp ip jp kp lp mp np op [406] pp qp rp sp tp up vp wp xp yp zp aq bq cq dq [421] eq fq gq hq iq jq kq lq mq nq oq pq qq rq sq [436] tq uq vq wq xq yq zq ar br cr dr er fr gr hr [451] ir jr kr lr mr nr or pr qr rr sr tr ur vr wr [466] xr yr zr as bs cs ds es fs gs hs is js ks ls [481] ms ns os ps qs rs ss ts us vs ws xs ys zs at [496] bt ct dt et ft gt ht it jt kt lt mt nt ot pt [511] qt rt st tt ut vt wt xt yt zt au bu cu du eu [526] fu gu hu iu ju ku lu mu nu ou pu qu ru su tu [541] uu vu wu xu yu zu av bv cv dv ev fv gv hv iv [556] jv kv lv mv nv ov pv qv rv sv tv uv vv wv xv [571] yv zv aw bw cw dw ew fw gw hw iw jw kw lw mw [586] nw ow pw qw rw sw tw uw vw ww xw yw zw ax bx [601] cx dx ex fx gx hx ix jx kx lx mx nx ox px qx [616] rx sx tx ux vx wx xx yx zx ay by cy dy ey fy [631] gy hy iy jy ky ly my ny oy py qy ry sy ty uy [646] vy wy xy yy zy az bz cz dz ez fz gz hz iz jz [661] kz lz mz nz oz pz qz rz sz tz uz vz wz xz yz
which seems like a simple request. But how to do this idiomatically in R
?
It’s quite often that you want to do for (i in 1:222) { for (j in 1:333) { for (k in 1:444) { stuff }}}
.
Also nice to know that R
has already provided access to “the 13th letter in the alphabet” with letters[13]
, so it’s unnecessary to redefine alphabet
every time. (yay!)
As used in maths, the inner product of two [tensors | matrices | vectors] shrinks the output, and the outer product enlargens the output. In this case, “outer product” cycles through for (1:26) { for (1:26) { fill up the matrix with each entry [i,j] } }
and does so idiomatically—that is, with vectorised loops. (Which is the goal in R
, J
, and other vectorised languages.)
Here’s my answer, and I’d like to hear your comments or better/also-good solutions.
c( outer( letters, letters,FUN=paste ,sep=""))
Broken down:
letters[1:26]
= iterate through the alphabet.letters
also does the whole alphabet.outer
= outer product of two arrays, tryouter( 2:7, 3:5 )
at theR
prompt and then tryouter( 1:26, 1:26, FUN=paste )
. (In mathsouter
contrasts with convolution =2:7 * 3:8
inR
— and with inner-producting, which is the dot-product, similar to determinant, equal to a projection, same as matrix multiplication, essentially the∑i•j•k
essentially the sum-product of terms =2:7 %*% 3:8
.)FUN=paste, sep=""
The grand theory behind this is much more complicated than what it does.paste
concatenates two strings, with a default separator of spacesep=" "
.
The gnarly theory reason:FUN
is an argument toouter
, which defaults to multiplication (you see this inouter( 1:26, 1:26 ) )
but can be set to concatenation since we’re working with characters rather than numbers. Then topass sep=""
to paste — how to do that? You get a problem callingFUN=paste( sep="")
because that’s incoherent to the computer. You could do an ugly workaround withFUN=function(x) paste(x, sep="")
… but the makers ofR
foresaw that you would often want to do things like this, so in addition toFUN
they madeARGS
come afterFUN
, only needing the distinguishment of a comma, andARGS
passes arguments toFUN
, so you can writesep=""
within outer, without having to make afunction(x)
specifically to pass toFUN
.
Wow, that was notfun
.c
= the natural output is 2-dimensional andc
streamlines that into one single vector.
Another way to do it is:
sapply( letters, FUN=function(x) paste(x, letters, sep="") )
which I think is uglier … perhaps because it uses letters
twice or perhaps because I think outer-producting is what I’m really doing.
Thoughts? Can it be done even more idiomatically or naturally?
UPDATE: gappy3000
says expand.grid()
scales better than outer()
.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.