Memoization in R : Illustrative example

[This article was first published on We think therefore we R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I came across a nice problem at project euler that gave me sense of satisfaction that was unusual, I think that because I don’t usually get the solutions right the first time as I did in this case. Anyhow, I shall try and decode the R codes that I used in simple English language and Mathematics.

Problem 14:
The following iterative sequence is defined for the set of positive integers:
n n/2 (n is even)
n 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:
13 40 20 10 5 16 8 4 2 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.
Which starting number, under one million, produces the longest chain?
NOTE: Once the chain starts the terms are allowed to go above one million.
view raw Problem_14.txt hosted with ❤ by GitHub


Let me first illustrate the brute force method, that is usually the method used by novice coders like myself. The idea here is to find the largest number below 1 million that gives the maximum number of the above mentioned iterations.

shreyes <- function(temp) ## Cute function that returns the number of iterations that were preformed.
{ c <- 0
while(temp > 1)
{ if(temp%%2==0) temp <- temp/2 else temp <- 3*temp + 1
c <- c+1
}
return(c)
}
largest <- 0
num <- 0
system.time(for(i in c(1:1000000))
{
iter <- shreyes(i) # Here we get the number of iterations for "i" and we do it for each number from 1 to 1 million
if(iter > largest) # If the number of iterations were greater than the previous largest number of iterations
{ # update the largest number of iterations and store the number in "num"
largest <- iter
num <- i
}
})
view raw Problem_14.R hosted with ❤ by GitHub


So what I have done above is simply performed the iteration for each and every integer from 1 to 1 million and using a counter variable kept a track of which number gave me the largest number of iterations and recorded the corresponding number, which is what we needed in the end. The idea was straight forward the only challenge was to come up with that cute function (which we now see is not that challenging after all).

Well, now that the novice part is done lets get to what Utkarsh (my pro bro) had to say about this. My codes took ~ 701.75 seconds to run (on my Sony vaio VPCCW12EN), this was completely fine by me. Utkarsh shrugged in his usual nonchalant manner at my codes and came up with an awesome algorithm to optimize the above calculation and saving some precious time (which I think he referred to as Memoization). The idea that he worked on was that since in many cases we would already have computed the number of iterations there was no need to keep computing then again. Suppose in the example in the question we see that 13 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1. Now in the computation of 13 if say we already have that 10 will further iterate say 6 times we would not have to go all the way to 1. Similarly even for 10 if we know that 5 further iterates 5 times we don’t need to go all the way back till 1. This would be more clear when we take a look at the codes.

single.call <- function(limit) { # Another cute function that returns the vector that contains the number of iterations for each number.
memo <- rep(-1, limit)
memo[1] <- 0
for(i in c(2:limit)) {
l <- 0
n <- i
while(n >= i) { # Check only so long as "n > i" and not "1" this is basically the optimization we wanted.
l <- l + 1
if(n %% 2 == 0) {
n <- n / 2
} else {
n <- 3 * n + 1
}
} # This while loop makes sure that the number is iterated only till the time it is greater than one of the previously computed number.
# In our case, (where the number is 13) this loop will run till the value after iteration reaches 10 (which has been previously computed and is stored in memo[10] think why?)
memo[i] <- memo[n] + l # This is where the magic takes place. You count the steps that took while iterating 13 -> 40 -> 20 -> 10 (that is 4, this is "l") and then just add it to memo[10] (which contains the number of iteration that are needed to go from 10 to 1)
}
return(memo)
}
system.time(temp <- single.call(1000000)) # Now do this for 1,000,000 (instead of 13)
which(temp == max(temp)) # Which number has the maximum iterations
view raw project_14_UT.R hosted with ❤ by GitHub


The above codes, courtesy Utkarsh, took ~ 50 seconds. As it turns out I was 1,390% inefficient as compared to this optimal algorithm. I would glad to know if there is any other optimization technique (apart from using super computers) that might reduce the computational time, please share if you can find a better way of coding this in R. 

To leave a comment for the author, please follow the link and comment on their blog: We think therefore we R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)