Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I like both Python and R, and teach them both, but for data science R is the clear choice. When asked why, I always note (a) written by statisticians for statisticians, (b) built-in matrix type and matrix manipulations, (c) great graphics, both base and CRAN, (d) excellent parallelization facilities, etc. I also like to say that R is “more CS-ish than Python,” just to provoke my fellow computer scientists.
But one aspect that I think is huge but probably gets lost when I cite it is R’s row/column/element-name feature. I’ll give an example here.
Today I was dealing with a problem of ID numbers that are nonconsecutive. My solution was to set up a lookup table. Say we have ID data (5, 12, 13, 9, 12, 5, 6). There are 5 distinct ID values, so we’d like to map these into new IDs 1,2,3,4,5. Here is a simple solution:
> x <- c(5,12,13,9,12,5,6)
> xuc <- as.character(unique(x))
> xuc
[1] “5” “12” “13” “9” “6”
> xLookup <- 1:length(xuc)
> names(xLookup) <- xuc
> xLookup
5 12 13 9 6
1 2 3 4 5
So, from now on, to do the lookup, I just use as subscript the character from of the original ID, e.g.
> xLookup[’12’]
12
2
Of course, I did all this within program code. So to change a column of IDs to the new ones, I wrote
ul[as.character(testSet[,1])])
Lots of other ways to do this, of course, but it shows how handy the names can be.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.