Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values.
Of course, sometimes it takes a while to figure out how to do this. Please read for a great R matrix lookup problem and solution.
In R we can specify operations over vectors. For arithmetic this is easy, but some more complex operations you “need to know the trick.”
Patrick Freeman (@PTFreeman) recently asked: what is the idiomatic way to look up a bunch of values from a matrix by row and column keys? This is actually easy to do if we first expand the data matrix into RDF-triples. If our data were in this format we could merge/join it against our desired column indices.
Let’s start with an example data matrix.
# example matrix data m <- matrix(1:9, nrow = 3) row.names(m) <- c('R1' ,'R2', 'R3') colnames(m) <- c('C1', 'C2', 'C3') knitr::kable(m)
C1 | C2 | C3 | |
---|---|---|---|
R1 | 1 | 4 | 7 |
R2 | 2 | 5 | 8 |
R3 | 3 | 6 | 9 |
And our data-frame containing the indices we want to look-up.
# row/columns we want w <- data.frame( i = c('R1', 'R2', 'R2'), j = c('C2', 'C3', 'C2')) knitr::kable(w)
i | j |
---|---|
R1 | C2 |
R2 | C3 |
R2 | C2 |
That is: we want to know the matrix values from [R1, C2], [R2, C3], and [R2, C2].
The trick is: how do we convert the matrix into triples? digEmAll, has a great solution to that here.
# unpack into 3-column format from: # https://stackoverflow.com/a/9913601 triples <- data.frame( i = rep(row.names(m), ncol(m)), j = rep(colnames(m), each = nrow(m)), v = as.vector(m)) knitr::kable(triples)
i | j | v |
---|---|---|
R1 | C1 | 1 |
R2 | C1 | 2 |
R3 | C1 | 3 |
R1 | C2 | 4 |
R2 | C2 | 5 |
R3 | C2 | 6 |
R1 | C3 | 7 |
R2 | C3 | 8 |
R3 | C3 | 9 |
What the above code has done is: write each entry of the original matrix as a separate row with the original row and column ids landed as new columns. This data format is very useful.
The above code is worth saving as a re-usable snippet, as getting it right is a clever step.
Now we can vectorize our lookup using the merge command, which produces a new joined table where the desired values have been landed as a new column.
res <- merge(w, triples, by = c('i', 'j'), sort = FALSE) knitr::kable(res)
i | j | v |
---|---|---|
R1 | C2 | 4 |
R2 | C3 | 8 |
R2 | C2 | 5 |
And that is it: we have used vectorized and relational concepts to look up many values from a matrix very quickly.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.