Finding a circle in a chart by @ellis2013nz

[This article was first published on free range statistics - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

So this tweet came across my feed. To save you going there it is about a selection exercise for a job for (I think) an IT start up, described proudly by its author as “insanely hard”; and the job is to find the radius of the brown circle in this diagram:

At first it just annoyed me, and my reaction was a) I’d never want to work for this guy and b) this is a really dumb way to select your team, unless you’re recruiting for the maths department at a university. I strongly believe that recruitment exercises should be:

  • as similar as possible to the real work required
  • easy enough that any reasonable candidate can do it, but with enough complexity that some of them will do it really well
  • reasonable to do in the time available

…and I don’t think this does any of those things.

But then I got sufficiently interested to think about how I’d do the answer. I’m not good enough to work it out with pen and paper in a reasonable time frame. There is of course an optimal professional way to go about it, which is to throw all the relevant equations including that of the brown circle into a machine that can solve them for you, as posted by one user on Twitter here with WolframAlpha working it out for you. The answer is r = 0.213842. (Am I dreaming or did WolframAlpha used to be called Mathematica? Or is that a different thing? Ah ok Wikipedia shows me that the answer is ‘sort of’ – WolframAlpha is based on the earlier product, Mathematica.)

My own intuition on how to find the answer – because I’m not fluent in WolframAlpha, which I now see is by far the best way in absolute terms – was to find the centre point of the circle first. Basically use computing power to search for a point that has its shortest point equidistant from the three lines shown. This also made me think that the work isn’t that different from what I can imagine being required in jobs I can think of in the real world (less likely to be mathematical, but I can imagine looking for the largest circle that fits between some boundaries), so perhaps I was a bit harsh in dismissing the exercise entirely.

Here’s how I do this brute force method in R:

Post continues after R code

library(tidyverse)

# Define some values of the red line (y = sqrt(x)) and purple line (unit circle)
# for 1000 values of x:
d1 <- tibble(x = seq(from = 0, to = 1, length.out = 1000)) |>
  mutate(line1_y = sqrt(x),
         line2_y  = sqrt(1 -x ^2 )) 

# As above but finer grid, for 5000 values of x
d1_fine <- tibble(x = seq(from = 0, to = 1, length.out = 5000)) |>
  mutate(line1_y = sqrt(x),
         line2_y  = sqrt(1 -x ^2 )) 

# Cut down versions of that for just the visually plausible part on the left of the chart
d2 <- filter(d1, x < 0.6)
d2_fine <- filter(d1_fine, x < 0.6)

# Function to search a rectangle (defined by first four parameters), calculate the
# distance of each point in the rectangle from 3 lines (vertical at 0, y = sqrt(x),
# and unit circle) and return the 100 points with the highest value of the
# lowest of the differences from the three lines
get_dc <- function(x_from, x_to, y_from, y_to, d2, length = 100){
  possibilities <- expand_grid(can_x = seq(from = x_from, to = x_to, length.out = length),
                               can_y = seq(from = y_from, to = y_to, length.out = length))
  
  # Distances from the points on the y = sqrt(x) line
  distances1 <- expand_grid(d2[, 1:2], possibilities) |>
    mutate(distance_1 = sqrt((x - can_x) ^ 2 + (line1_y - can_y) ^ 2)) |>
    group_by(can_x, can_y) |>
    arrange(distance_1) |>
    slice(1) |>
    ungroup() |>
    select(can_x, can_y, distance_1)
  
  # Distances from the points in the unit circle
  distances2 <- expand_grid(d2[, c(1, 3)], possibilities) |>
    mutate(distance_2 = sqrt((x - can_x) ^ 2 + (line2_y - can_y) ^ 2)) |>
    group_by(can_x, can_y) |>
    arrange(distance_2) |>
    slice(1) |>
    ungroup() |>
    select(can_x, can_y, distance_2)
  
  dc <- distances1 |>
    left_join(distances2, by = c("can_x", "can_y")) |>
    # add the distance to the vertical black line
    mutate(distance_3 = can_x,
           # which is the smallest of the three distances:
           smallest = pmin(distance_1, distance_2, distance_3)) |>
    arrange(desc(smallest)) |>
    slice(1:100)
}

# coarse search:
dc1 <- get_dc(0, 0.5, 0.5, 1, d2)
dc1

That gets me these results. The first two columns are the candidate values for the centre of the brown circle. The next three columns are the shortest distance from that candidate center to a set of points on the lines for the red (y=sqrt(x)), purple (unit circle), and vertical black lines. The final column is the smallest of those three values - hence the smallest circle that could be drawn from that candidate point without crossing one of the three lines. The output is ordered so the candidate points with the highest value of smallest come first:

# A tibble: 100 × 6
   can_x can_y distance_1 distance_2 distance_3 smallest
   <dbl> <dbl>      <dbl>      <dbl>      <dbl>    <dbl>
 1 0.212 0.758      0.216      0.213      0.212    0.212
 2 0.217 0.758      0.213      0.212      0.217    0.212
 3 0.212 0.753      0.212      0.218      0.212    0.212
 4 0.222 0.758      0.209      0.211      0.222    0.209
 5 0.217 0.753      0.209      0.217      0.217    0.209
 6 0.212 0.763      0.220      0.208      0.212    0.208
 7 0.212 0.747      0.208      0.223      0.212    0.208
 8 0.207 0.742      0.207      0.229      0.207    0.207
 9 0.207 0.747      0.211      0.224      0.207    0.207
10 0.207 0.753      0.215      0.220      0.207    0.207

I’d deliberately written the work above into a function so I could easily iterate from this rough effort. Taking a much finer grid using the range of 100 candidate values of my first effort, and comparing to a more detailed set of points on the red and blue lines gets me a more precise estimate of the best circle centre, and ultimately the number we were asked to find, the radius of the cirle at that point that touches the three lines.

# more precise search:
dc2 <- get_dc(min(dc1$can_x), max(dc1$can_x), min(dc1$can_y), max(dc1$can_y), 
              d2 = d2_fine, length = 200)

dc2[1, ]$smallest

Which gets me this value:

[1] 0.2138219

This is the same as the answer from WolframAlpha to 4 decimal places. I will note that this took me about 40 lines of code whereas the WolframAlpha solution is done in 1 (or 4, depending how you lay it out). Which shows the value of using the right tool for the job… definitely WolframAlpha when you can express those boundary lines with mathematical formula. My method has advantage that it could work with arbitrary lines (e.g. country borders or something?), not just those expressed by an equation, which is actually a pretty big advantage in what seem to me anything like a realistic application.

Now, to draw the plot. This is simple, the only trick is to note the parse = TRUE argument to geom_text when annoting with equations, and a handy function (thanks Trevor!) for drawing a circle on a ggplot2 plot:

#----------------plot----------------

# See Trevor's answer at
# https://stackoverflow.com/questions/6862742/draw-a-circle-with-ggplot2
# for this convenient function for drawing circles in a ggplot2 chart:
gg_circle <- function(r, xc, yc, color="black", fill=NA, ...) {
  x <- xc + r*cos(seq(0, pi, length.out=100))
  ymax <- yc + r*sin(seq(0, pi, length.out=100))
  ymin <- yc + r*sin(seq(0, -pi, length.out=100))
  annotate("ribbon", x=x, ymin=ymin, ymax=ymax, color=color, fill=fill, ...)
}

lw <- 1 # linewidth

p <- d1 |>
  ggplot(aes(x = x)) +
  geom_line(aes(y = line1_y), colour = "red", linewidth = lw) +
  geom_line(aes(y = line2_y), colour = "purple", linewidth = lw) +
  geom_vline(xintercept = 0, colour = "black", linewidth = lw) +
  gg_circle(xc = dc2[1, ]$can_x, yc = dc2[1, ]$can_y, r = dc2[1, ]$can_x, 
            color = "orange", fill = "orange", alpha = 0.5) +
  coord_equal() +
  labs(x = "", y = "", title = "Find the radius of the brown circle") +
  annotate("text", x = 0.8, y = 0.95, label = "y==sqrt(x)", parse = TRUE, colour = "red") +
  annotate("text", x = 0.8, y = 0.70, label = "x^2 + y^2 == 1", parse = TRUE, colour = "purple")

That’s all for today. Take care out there.

To leave a comment for the author, please follow the link and comment on their blog: free range statistics - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.