R > Python: a Concrete Example

matloff

3 years ago

[This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I like both Python and R, and teach them both, but for data science R is the clear choice. When asked why, I always note (a) written by statisticians for statisticians, (b) built-in matrix type and matrix manipulations, (c) great graphics, both base and CRAN, (d) excellent parallelization facilities, etc. I also like to say that R is “more CS-ish than Python,” just to provoke my fellow computer scientists.

But one aspect that I think is huge but probably gets lost when I cite it is R’s row/column/element-name feature. I’ll give an example here.

Today I was dealing with a problem of ID numbers that are nonconsecutive. My solution was to set up a lookup table. Say we have ID data (5, 12, 13, 9, 12, 5, 6). There are 5 distinct ID values, so we’d like to map these into new IDs 1,2,3,4,5. Here is a simple solution:

> x <- c(5,12,13,9,12,5,6)
> xuc <- as.character(unique(x))
> xuc
[1] “5” “12” “13” “9” “6”
> xLookup <- 1:length(xuc)
> names(xLookup) <- xuc
> xLookup
5 12 13 9 6
1 2 3 4 5

So, from now on, to do the lookup, I just use as subscript the character from of the original ID, e.g.

> xLookup[’12’]
12
2

Of course, I did all this within program code. So to change a column of IDs to the new ones, I wrote

ul[as.character(testSet[,1])])

Lots of other ways to do this, of course, but it shows how handy the names can be.

To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.