Prime numbers as sums of three squares. by @ellis2013nz

free range statistics - R

11 hours ago

[This article was first published on free range statistics - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I was interested by a LinkedIn post about the number 397:

“397 is conjectured to be the largest prime that can be represented uniquely as the sum of three positive squares”

That is, 3^2 + 8^2 + 18^2 = 397

This led to some confusion in the comments as people found other prime numbers that can be created as the sum of three squares. But the wording is sloppy; better wording would be:

“397 is conjectured to be the largest prime that can be represented as the sum of three positive squares of integers in exactly one way”

Let’s confirm that. The only method I know for something like this is brute force. I can make a data frame with three columns, each with all the squares of positive integers up to some maximum point – so the data frame has every combination of those, discarding duplicate combinations by making the order of the columns strictly non-decreasing. Then we sum those three squares, and store the results:

library(primes)
library(tidyverse)
library(glue)

k <- 30
squares <- (1:k) ^ 2
primes <- tibble(p = generate_primes(max = k^2))

# This is the part that gets slower with larger k as you make k^3 combinations
s3s <- expand_grid(s1 = squares, s2 = squares, s3 = squares) |>
  filter(s2 >= s1 & s3 >= s2) |> 
  mutate(sum_3_sq = s1 + s2 + s3)

# example:
filter(s3s, sum_3_sq == 397)

This gets us our one combination that adds up to 397:

     s1    s2    s3 sum_3_sq
  <dbl> <dbl> <dbl>    <dbl>
1     9    64   324      397

Next step is to count the number of times each result appears.

s3s_sum <- s3s |>
  group_by(sum_3_sq) |>
  summarise(number_3_square_sums = n())

# example:
s3s_sum |>
  filter(number_3_square_sums == 3) |>
  slice(1) |>
  left_join(s3s, by = "sum_3_sq")

So for example we see that 54 (which of course is not a prime – we haven’t yet filtered to primes) can be made 3 ways: as the sum of the squares of 1, 2, 7; of 2, 5, 5; and 3, 3, 6:

  sum_3_sq number_3_square_sums    s1    s2    s3
     <dbl>                <int> <dbl> <dbl> <dbl>
1       54                    3     1     4    49
2       54                    3     4    25    25
3       54                    3     9     9    36

Then it’s a simple matter of joining that summary (of counts of the number of ways to get a given total of three squares) to a data frame of the prime numbers, and drawing a plot of the results:

res <- primes |>
  left_join(s3s_sum, by = c("p" = "sum_3_sq")) |>
  mutate(number_3_square_sums = replace_na(number_3_square_sums, 0))

ggplot(res, aes (x = p, y = number_3_square_sums)) +
  geom_point() +
  annotate("point", colour = "red", shape = 1, size = 4, x = 397, y = 1) +
  annotate("text", colour = "red", label = "397", x = 500, y = 1, hjust = 0) +
  scale_x_continuous(label = comma) +
  labs(x = "Prime number",
       y = "Number of ways",
       title = "Number of ways to make a prime number as sum of three positive squares of integers",
       subtitle = glue("397 (circled) is the largest with exactly one way, of primes up to {comma(k^2)}."))

if(max(res$number_3_square_sums) < 11){
  p <- p + 
    scale_y_continuous(breaks = 0:10) +
    theme(panel.grid.minor.y = element_blank())
}

There’s a bit of fiddling there to make the charts look nice for low values of k (where k is the maximum number I square). In my real life code the above is surrounded by a loop of different values of k, with the results saved as SVG and PNG images for use in this blog. All of which gets me this result:

And here we see for some larger results of k:

As k gets bigger of course the program gets slower to run. This method doesn’t scale well; for primes above about 250,000 the step of making a data frame of all the combinations of three squares starts taking too long. If I wanted to extend this further I’d have to find some ways to do this more efficiently, or put that step into a database that can handle bigger-than-memory data objects.

I’m sure there’s some interesting maths behind why this is just a “conjecture” and no-one has been able to prove it!

That’s all for today.

To leave a comment for the author, please follow the link and comment on their blog: free range statistics - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Related