Memoization in R : Illustrative example
[This article was first published on We think therefore we R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I came across a nice problem at project euler that gave me sense of satisfaction that was unusual, I think that because I don’t usually get the solutions right the first time as I did in this case. Anyhow, I shall try and decode the R codes that I used in simple English language and Mathematics.
Let me first illustrate the brute force method, that is usually the method used by novice coders like myself. The idea here is to find the largest number below 1 million that gives the maximum number of the above mentioned iterations.
So what I have done above is simply performed the iteration for each and every integer from 1 to 1 million and using a counter variable kept a track of which number gave me the largest number of iterations and recorded the corresponding number, which is what we needed in the end. The idea was straight forward the only challenge was to come up with that cute function (which we now see is not that challenging after all).
Well, now that the novice part is done lets get to what Utkarsh (my pro bro) had to say about this. My codes took ~ 701.75 seconds to run (on my Sony vaio VPCCW12EN), this was completely fine by me. Utkarsh shrugged in his usual nonchalant manner at my codes and came up with an awesome algorithm to optimize the above calculation and saving some precious time (which I think he referred to as Memoization). The idea that he worked on was that since in many cases we would already have computed the number of iterations there was no need to keep computing then again. Suppose in the example in the question we see that 13 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1. Now in the computation of 13 if say we already have that 10 will further iterate say 6 times we would not have to go all the way to 1. Similarly even for 10 if we know that 5 further iterates 5 times we don’t need to go all the way back till 1. This would be more clear when we take a look at the codes.
The above codes, courtesy Utkarsh, took ~ 50 seconds. As it turns out I was 1,390% inefficient as compared to this optimal algorithm. I would glad to know if there is any other optimization technique (apart from using super computers) that might reduce the computational time, please share if you can find a better way of coding this in R.
Let me first illustrate the brute force method, that is usually the method used by novice coders like myself. The idea here is to find the largest number below 1 million that gives the maximum number of the above mentioned iterations.
So what I have done above is simply performed the iteration for each and every integer from 1 to 1 million and using a counter variable kept a track of which number gave me the largest number of iterations and recorded the corresponding number, which is what we needed in the end. The idea was straight forward the only challenge was to come up with that cute function (which we now see is not that challenging after all).
Well, now that the novice part is done lets get to what Utkarsh (my pro bro) had to say about this. My codes took ~ 701.75 seconds to run (on my Sony vaio VPCCW12EN), this was completely fine by me. Utkarsh shrugged in his usual nonchalant manner at my codes and came up with an awesome algorithm to optimize the above calculation and saving some precious time (which I think he referred to as Memoization). The idea that he worked on was that since in many cases we would already have computed the number of iterations there was no need to keep computing then again. Suppose in the example in the question we see that 13 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1. Now in the computation of 13 if say we already have that 10 will further iterate say 6 times we would not have to go all the way to 1. Similarly even for 10 if we know that 5 further iterates 5 times we don’t need to go all the way back till 1. This would be more clear when we take a look at the codes.
The above codes, courtesy Utkarsh, took ~ 50 seconds. As it turns out I was 1,390% inefficient as compared to this optimal algorithm. I would glad to know if there is any other optimization technique (apart from using super computers) that might reduce the computational time, please share if you can find a better way of coding this in R.
To leave a comment for the author, please follow the link and comment on their blog: We think therefore we R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.