Simulated Annealing in Julia

[This article was first published on John Myles White » Statistics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Building Optimization Functions for Julia

In hopes of adding enough statistical functionality to Julia to make it usable for my day-to-day modeling projects, I’ve written a very basic implementation of the simulated annealing (SA) algorithm, which I’ve placed in the same JuliaVsR GitHub repository that I used for the code for my previous post about Julia. For easier reading, my current code for SA is shown below:

The Simulated Annealing Algorithm

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# simulated_annealing()
# Arguments:
# * cost: Function from states to the real numbers. Often called an energy function, but this algorithm works for both positive and negative costs.
# * s0: The initial state of the system.
# * neighbor: Function from states to states. Produces what the Metropolis algorithm would call a proposal.
# * temperature: Function specifying the temperature at time i.
# * iterations: How many iterations of the algorithm should be run? This is the only termination condition.
# * keep_best: Do we return the best state visited or the last state visisted? (Should default to true.)
# * trace: Do we show a trace of the system's evolution?
 
function simulated_annealing(cost,
                             s0,
                             neighbor,
                             temperature,
                             iterations,
                             keep_best,
                             trace)
 
  # Set our current state to the specified intial state.
  s = s0
 
  # Set the best state we've seen to the intial state.
  best_s = s0
 
  # We always perform a fixed number of iterations.
  for i = 1:iterations
    t = temperature(i)
    s_n = neighbor(s)
    if trace println("$i: s = $s") end
    if trace println("$i: s_n = $s_n") end
    y = cost(s)
    y_n = cost(s_n)
    if trace println("$i: y = $y") end
    if trace println("$i: y_n = $y_n") end
    if y_n <= y
      s = s_n
      if trace println("Accepted") end
    else
      p = exp(- ((y_n - y) / t))
      if trace println("$i: p = $p") end
      if rand() <= p
        s = s_n
        if trace println("Accepted") end
      else
        s = s
        if trace println("Rejected") end
      end
    end
    if trace println() end
    if cost(s) < cost(best_s)
      best_s = s
    end
  end
 
  if keep_best
    best_s
  else
    s
  end
end

I’ve tried to implement the algorithm as it was presented by Bertsimas and Tsitsiklis in their 1993 review paper in Statistical Science. The major differences between my implementation and their description of the algorithm is that (1) I’ve made it possible to keep the best solution found during the search rather than always use the last solution found and (2) I’ve made no effort to select a value for their d parameter carefully: I’ve simply set it to 1 for all of my examples, though my code will allow you to specify another rule for setting the temperature of the annealer at time t other than the 1 / log(t) cooling scheme I’ve been using. (In fact, the code currently forces you to select a cooling scheme, since there are no default arguments in Julia yet.)

I chose simulated annealing as my first optimization algorithm to implement in Julia because it’s a natural relative of the Metropolis algorithm that I used in the previous post. Indeed, coding up an implementation of SA made me appreciate the fact that the Metropolis algorithm as used in Bayesian statistics can be considered a special case of the SA algorithm in which the temperature is always 1 and in which the cost function for a state with posterior probability p is -log(p).

Coding up the SA algorithm for myself also me made realize why it’s important that it uses an additive comparison of cost functions rather than a ratio (as in the Metropolis algorithm): the ratio goes haywire when the cost function can take on both positive and negative values (which, of course, doesn’t matter for Bayesian methods because probabilities are strictly non-negative). I discovered this when I initially tried to code up SA from my inaccurate memory without first consulting the literature and discovered that I couldn’t get a ratio-based algorithm to work no matter how many times I tried changing the cooling schedule.

To test out the SA implementation I’ve written, I’ve written two tests cases that attempt to minimize the Rosenbrock and Himmelbrau functions, which I found listed as examples of hard-to-minimize functions in the Wikipedia description of the Nelder-Mead method. You can see those two examples below this paragraph. In addition, I’ve used R to generate plots showing how the SA algorithm works under repeated application on the same optimization problem; in these plots, I’ve used a heatmap to show the cost functions value at each (x, y) position, colored crosshairs to indicate the position of a true minimum of the function in question, and red dots to indicate the purported solutions found by my implementation of SA.

Finding the Minimum of the Rosenbrock Function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Find a solution of the Rosenbrock function using SA.
load("simulated_annealing.jl")
load("../rng.jl")
 
function rosenbrock(x, y)
  (1 - x)^2 + 100(y - x^2)^2
end
 
function neighbors(z)
  [rand_uniform(z[1] - 1, z[1] + 1), rand_uniform(z[2] - 1, z[2] + 1)]
end
 
srand(1)
 
solution = simulated_annealing(z -> rosenbrock(z[1], z[2]),
                               [0, 0],
                               neighbors,
                               log_temperature,
                               10000,
                               true,
                               false)
rosenbrock.png

Finding the Minima of the Himmelbrau Function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Find a solution of the Himmelbrau function using SA.
load("simulated_annealing.jl")
load("../rng.jl")
 
function himmelbrau(x, y)
  (x^2 + y - 11)^2 + (x + y^2 - 7)^2
end
 
function neighbors(z)
  [rand_uniform(z[1] - 1, z[1] + 1), rand_uniform(z[2] - 1, z[2] + 1)]
end
 
srand(1)
 
solution = simulated_annealing(z -> himmelbrau(z[1], z[2]),
                               [0, 0],
                               neighbors,
                               log_temperature,
                               10000,
                               true,
                               false)
himmelbrau.png

Moving Forward

Now that I’ve got a form of SA working, I’m interested in coding up a suite of optimization functions for Julia so that I can start to do maximum likelihood estimation in pure Julia. Once that’s possible, I can use Julia to do real science, e.g. when I need to fit simple models for which finding the MLE is appropriate. (I will leave the development of cleaner statistical functions for special cases of maximum likelihood estimation to more capable people, like Douglas Bates, who has already produced some GLM code.)

At present my code is meant simply to demonstrate how one could write an implementation of simulated annealing in Julia. I’m sure that the code can be more efficient and I suspect that I’ve violated some of the idioms of the language. In addition, I’d much prefer that this function use default values for many of the arguments, as there is no reason that an end-user needs to be concerned with finding the best cooling schedule if SA seems to work out of the box on their problem with the cooling schedule I’ve been using.

With those disclaimers about my code in place, I’d like to think that I haven’t made any mathematical errors and that this function can be used by others. So, I’d ask that those interested please tear apart my code so that I can make it usable as a general purpose function for optimization in Julia. Alternatively, please tell me that there’s no need for a pure Julia implementation of SA because, for example, there are nice C libraries that would be much easier to link to than to re-implement.

With an implementation of SA in place, I’ll probably start working on implementing L-BFGS-S soon, which is the other optimization algorithm I use often in R. (To be honest, I use L-BFGS-S almost exclusively, but SA was much easier to implement.)

Incidentally, this code base demonstrates how I view the relationship between R and Julia: Julia is a beautiful new language that is still missing many important pieces. We can all work together to build the best pieces of R that are missing from Julia. While we’re working on improving Julia, we’ll need to keep using R to handle things like visualization of our results. For this post, I turned back to ggplot2 for all of the graphics generation.

To leave a comment for the author, please follow the link and comment on their blog: John Myles White » Statistics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)