Site icon R-bloggers

Outer Product of Character Vectors in R

[This article was first published on Isomorphismes, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

What follows is like a kata to strengthen your R fundamentals.

The lovely stats in the wild recently posted some hott data analysis of Olympians’ ages and sexes. Because I’m annoyingly picky about graphics, I asked for his code so I could tweak the graphics according to my own perfidious norms. Stats in the wild posted his scraper of sports-reference.com — I’m sure you can find some more interesting uses for it — and asked for (polite) suggestions for improvement.

One potential place for improvement in stats in the wild’s code could answer two questions for R learners more generally so I’m sharing the code block.

alphabet<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z")

for (i.one in 1:26){
		for (i.two in 1:26){
letters<-paste(alphabet[i.one],alphabet[i.two],sep="")
				}
		} 

The desired goal is to get pairs of letters like

  [1] aa ba ca da ea fa ga ha ia ja ka la ma na oa
 [16] pa qa ra sa ta ua va wa xa ya za ab bb cb db
 [31] eb fb gb hb ib jb kb lb mb nb ob pb qb rb sb
 [46] tb ub vb wb xb yb zb ac bc cc dc ec fc gc hc
 [61] ic jc kc lc mc nc oc pc qc rc sc tc uc vc wc
 [76] xc yc zc ad bd cd dd ed fd gd hd id jd kd ld
 [91] md nd od pd qd rd sd td ud vd wd xd yd zd ae
[106] be ce de ee fe ge he ie je ke le me ne oe pe
[121] qe re se te ue ve we xe ye ze af bf cf df ef
[136] ff gf hf if jf kf lf mf nf of pf qf rf sf tf
[151] uf vf wf xf yf zf ag bg cg dg eg fg gg hg ig
[166] jg kg lg mg ng og pg qg rg sg tg ug vg wg xg
[181] yg zg ah bh ch dh eh fh gh hh ih jh kh lh mh
[196] nh oh ph qh rh sh th uh vh wh xh yh zh ai bi
[211] ci di ei fi gi hi ii ji ki li mi ni oi pi qi
[226] ri si ti ui vi wi xi yi zi aj bj cj dj ej fj
[241] gj hj ij jj kj lj mj nj oj pj qj rj sj tj uj
[256] vj wj xj yj zj ak bk ck dk ek fk gk hk ik jk
[271] kk lk mk nk ok pk qk rk sk tk uk vk wk xk yk
[286] zk al bl cl dl el fl gl hl il jl kl ll ml nl
[301] ol pl ql rl sl tl ul vl wl xl yl zl am bm cm
[316] dm em fm gm hm im jm km lm mm nm om pm qm rm
[331] sm tm um vm wm xm ym zm an bn cn dn en fn gn
[346] hn in jn kn ln mn nn on pn qn rn sn tn un vn
[361] wn xn yn zn ao bo co do eo fo go ho io jo ko
[376] lo mo no oo po qo ro so to uo vo wo xo yo zo
[391] ap bp cp dp ep fp gp hp ip jp kp lp mp np op
[406] pp qp rp sp tp up vp wp xp yp zp aq bq cq dq
[421] eq fq gq hq iq jq kq lq mq nq oq pq qq rq sq
[436] tq uq vq wq xq yq zq ar br cr dr er fr gr hr
[451] ir jr kr lr mr nr or pr qr rr sr tr ur vr wr
[466] xr yr zr as bs cs ds es fs gs hs is js ks ls
[481] ms ns os ps qs rs ss ts us vs ws xs ys zs at
[496] bt ct dt et ft gt ht it jt kt lt mt nt ot pt
[511] qt rt st tt ut vt wt xt yt zt au bu cu du eu
[526] fu gu hu iu ju ku lu mu nu ou pu qu ru su tu
[541] uu vu wu xu yu zu av bv cv dv ev fv gv hv iv
[556] jv kv lv mv nv ov pv qv rv sv tv uv vv wv xv
[571] yv zv aw bw cw dw ew fw gw hw iw jw kw lw mw
[586] nw ow pw qw rw sw tw uw vw ww xw yw zw ax bx
[601] cx dx ex fx gx hx ix jx kx lx mx nx ox px qx
[616] rx sx tx ux vx wx xx yx zx ay by cy dy ey fy
[631] gy hy iy jy ky ly my ny oy py qy ry sy ty uy
[646] vy wy xy yy zy az bz cz dz ez fz gz hz iz jz
[661] kz lz mz nz oz pz qz rz sz tz uz vz wz xz yz

which seems like a simple request. But how to do this idiomatically in R?

It’s quite often that you want to do for (i in 1:222) { for (j in 1:333) { for (k in 1:444) { stuff }}}

Also nice to know that R has already provided access to “the 13th letter in the alphabet” with letters[13], so it’s unnecessary to redefine alphabet every time. (yay!)

As used in maths, the inner product of two [tensors | matrices | vectors] shrinks the output, and the outer product enlargens the output. In this case, “outer product” cycles through for (1:26) { for (1:26) { fill up the matrix with each entry [i,j] } } and does so idiomatically—that is, with vectorised loops. (Which is the goal in R, J, and other vectorised languages.)

Here’s my answer, and I’d like to hear your comments or better/also-good solutions.

c( outer( letters, letters,FUN=paste ,sep=""))

Broken down:

Another way to do it is:

sapply( letters, FUN=function(x) paste(x, letters, sep="") )

which I think is uglier … perhaps because it uses letters twice or perhaps because I think outer-producting is what I’m really doing.

Thoughts? Can it be done even more idiomatically or naturally?

UPDATE: gappy3000 says expand.grid() scales better than outer().

To leave a comment for the author, please follow the link and comment on their blog: Isomorphismes.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.