Site icon R-bloggers

A Very Palette-able Post

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Many of my posts seem to begin with a link to a tweet, and this one falls into that pattern:

And @_inundata is already working on a #rstats palette. https://t.co/bNfpL7OmVl

— Timothée Poisot (@tpoi) May 21, 2017

I’d seen the Ars Tech post about the named color palette derived from some training data. I could tell at a glance of the resultant palette:

that it would not be ideal for visualizations (use this site test the final image in this post and verify that on your own) but this was a neat, quick project to take on, especially since it let me dust off an old GH package, adobecolor and it was likely I could beat Karthik to creating a palette 😉

The “B+” goal is to get a color palette that “matches” the one in the Tumlbr post. The “A” goal is to get a named palette.

These are all the packages we end up using:

library(tesseract)
library(magick)
library(stringi)
library(adobecolor) # hrbrmstr/adobecolor - may not be Windows friendly
library(tidyverse)

Attempt #1 (B+!!)

I’m a macOS user, so I’ve got great tools like xScope at my disposal. I’m really handy with that app and the Loupe tool makes it easy to point at a color, save it to a palette board and export an ACO palette file.

That whole process took ~18 seconds (first try). I’m not saying that to brag. But we often get hung up on both speed and programmatic reproducibility. I ultimately — as we’ll see in a bit — really went for speed vs programmatic reproducibility.

It’s dead simple to get the palette into R:

aco_fil <- "ml_cols.aco"
aco_hex <- rev(read_aco(aco_fil))

col2rgb(aco_hex)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## red    112  203   97  191  120  221  169  233  177   216    62   178   199
## green  112  198   92  174  114  196  167  191  138   200    63   184   172
## blue    85  166   73  156  124  199  171  143  109   185    67   196   146
##       [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23]
## red      48   172   177   203   219   162   152   232   197   191
## green    94   152   100   205   210    98   165   177   161   161
## blue     83   145   107   192   179   106   158   135   171   124

IIRC there may still be a byte-order issue (PRs welcome) I need to deal with on Windows in adobecolor but you likely will never need to use the package again.

A quick eyeball comparison between the Tumblr list and that matrix indicates the colors are off. That could be for many reasons starting from the way they were encoded in the PNG by whatever programming language was used to train the neural net and make the image (likely Python) to Tumblr degrading it to something on my end. You’ll see that the colors are close enough for humans that it’s likely close enough.

There, I’ve got a B+ with about a total of 60s of work! Plenty of time left to try shooting for an A!

Attempt #2 (FAIL)

We’ve got the PNG from the Tumblr post and the tesseract package in R. Perhaps this will be super-quick, too:

pal_img_fil <- "tumblr_inline_opgsh0UI6N1rl9zu7_400.png"

pal_ocr <- ocr(pal_img_fil)
stri_split_lines(pal_ocr)
## [[1]]
##  [1] "-ClaniicFug112113 84"      "-Snowhnn.k 201 199165"    
##  [3] "- Cmbabcl 97 93 68"        "-Bunfluw 190 174 155"      
##  [5] "-an:hing Blue 121 114125"  "Bank Bun 221 196199"      
##  [7] "- Caring Tan 171 166170"   "-Smrguun 233191 141"      
##  [9] "-Sink 176 131; 110"        "Slummy Beige 216 200135"  
## [11] "- Durkwumi 61 63 66"       "Flow/£1178 1114 196"      
## [13] "- Sand Dan 2111 172143"    "- Grade 136: 41; 94 x3"   
## [15] "-Ligh[OfBlasll75150147"    "-Grass 13m 176 99108"     
## [17] "Sindis Poop 204 205 194"   "Dupe 219 2119179"         
## [19] "-'n:sling156101 106"       "-SloncrElu13152165 159"   
## [21] "- Buxblc Simp 226 1x1 132" "-Sl.mky 13m197162171"     
## [23] "-'J\\milyl90164116"        ""                         
## [25] ""

Ugh.

Perhaps if we crop out the colors:

image_read(pal_img_fil) %>%
  image_crop("+57") %>%
  ocr() %>%
  stri_split_lines()
## [[1]]
##  [1] "Clanfic Fug112113 84"       "Snowhunk 201 199 165"     
##  [3] "Cmbabcl 97 93 as"          "Bunfluwl90174155"          
##  [5] "Kunming Blue 121 114 125"  "Bank Bun 221196199"       
##  [7] "Caring Tan 171 ms 170"     "Slarguun 233 191 141"     
##  [9] "Sinkl76135110"             ""                         
## [11] "SIIImmy Beige 216 200 135" "Durkwuud e1 63 66"        
## [13] "Flower 175 154 196"        ""                         
## [15] "Sand Dan 201 172 143"      "Grade 1m AB 94: 53"       
## [17] ""                          "Light 0mm 175 150 147"    
## [19] "Grass Ba! 17a 99 ms"       "sxndis Poop 204 205 194"  
## [21] "Dupe 219 209 179"          ""                         
## [23] "Tesling 156 101 106"       "SloncrEluc 152 165 159"   
## [25] "Buxblc Simp 226 131 132"   "Sumky Bean 197 162 171"   
## [27] "1\\mfly 190 164 11a"        ""                         
## [29] ""

Ugh.

I’m woefully unfamiliar with how to use the plethora of tesseract options to try to get better performance and this is taking too much time for a toy post, so we’ll call this attempt a failure 🙁

Attempt #3 (A-!!)

I’m going to go outside of R again to New OCR and upload the Tumblr palette there and crop out the colors (it lets you do that in-browser). NOTE: Never use any free site for OCR’ing sensitive data as most are run by content thieves.

Now we’re talkin’:

ocr_cols <- "Clardic Fug 112 113 84
Snowbonk 201 199 165
Catbabel 97 93 68
Bunfiow 190 174 155
Ronching Blue 121 114 125
Bank Butt 221 196 199
Caring Tan 171 166 170
Stargoon 233 191 141
Sink 176 138 110
Stummy Beige 216 200 185
Dorkwood 61 63 66
Flower 178 184 196
Sand Dan 201 172 143
Grade Bat 48 94 83
Light Of Blast 175 150 147
Grass Bat 176 99 108
Sindis Poop 204 205 194
Dope 219 209 179
Testing 156 101 106
Stoncr Blue 152 165 159
Burblc Simp 226 181 132
Stanky Bean 197 162 171
Thrdly 190 164 116"

We can get that into a more useful form pretty quickly:

stri_match_all_regex(ocr_cols, "([[:alpha:] ]+) ([[:digit:]]+) ([[:digit:]]+) ([[:digit:]]+)") %>%
  print() %>%
  .[[1]] -> col_mat
## [[1]]
##       [,1]                         [,2]             [,3]  [,4]  [,5] 
##  [1,] "Clardic Fug 112 113 84"     "Clardic Fug"    "112" "113" "84" 
##  [2,] "Snowbonk 201 199 165"       "Snowbonk"       "201" "199" "165"
##  [3,] "Catbabel 97 93 68"          "Catbabel"       "97"  "93"  "68" 
##  [4,] "Bunfiow 190 174 155"        "Bunfiow"        "190" "174" "155"
##  [5,] "Ronching Blue 121 114 125"  "Ronching Blue"  "121" "114" "125"
##  [6,] "Bank Butt 221 196 199"      "Bank Butt"      "221" "196" "199"
##  [7,] "Caring Tan 171 166 170"     "Caring Tan"     "171" "166" "170"
##  [8,] "Stargoon 233 191 141"       "Stargoon"       "233" "191" "141"
##  [9,] "Sink 176 138 110"           "Sink"           "176" "138" "110"
## [10,] "Stummy Beige 216 200 185"   "Stummy Beige"   "216" "200" "185"
## [11,] "Dorkwood 61 63 66"          "Dorkwood"       "61"  "63"  "66" 
## [12,] "Flower 178 184 196"         "Flower"         "178" "184" "196"
## [13,] "Sand Dan 201 172 143"       "Sand Dan"       "201" "172" "143"
## [14,] "Grade Bat 48 94 83"         "Grade Bat"      "48"  "94"  "83" 
## [15,] "Light Of Blast 175 150 147" "Light Of Blast" "175" "150" "147"
## [16,] "Grass Bat 176 99 108"       "Grass Bat"      "176" "99"  "108"
## [17,] "Sindis Poop 204 205 194"    "Sindis Poop"    "204" "205" "194"
## [18,] "Dope 219 209 179"           "Dope"           "219" "209" "179"
## [19,] "Testing 156 101 106"        "Testing"        "156" "101" "106"
## [20,] "Stoncr Blue 152 165 159"    "Stoncr Blue"    "152" "165" "159"
## [21,] "Burblc Simp 226 181 132"    "Burblc Simp"    "226" "181" "132"
## [22,] "Stanky Bean 197 162 171"    "Stanky Bean"    "197" "162" "171"
## [23,] "Thrdly 190 164 116"         "Thrdly"         "190" "164" "116"

The print() is in the pipe as I can never remember where each stringi functions stick lists but usually guess right, plus I wanted to check the output.

Making those into colors is super-simple:

y <- apply(col_mat[,3:5], 2, as.numeric)

ocr_cols <- rgb(y[,1], y[,2], y[,3], names=col_mat[,2], maxColorValue = 255)

If we look at Attempt #1 and Attempt #2 together:

ocr_cols
##    Clardic Fug       Snowbonk       Catbabel        Bunfiow  Ronching Blue 
##      "#707154"      "#C9C7A5"      "#615D44"      "#BEAE9B"      "#79727D" 
##      Bank Butt     Caring Tan       Stargoon           Sink   Stummy Beige 
##      "#DDC4C7"      "#ABA6AA"      "#E9BF8D"      "#B08A6E"      "#D8C8B9" 
##       Dorkwood         Flower       Sand Dan      Grade Bat Light Of Blast 
##      "#3D3F42"      "#B2B8C4"      "#C9AC8F"      "#305E53"      "#AF9693" 
##      Grass Bat    Sindis Poop           Dope        Testing    Stoncr Blue 
##      "#B0636C"      "#CCCDC2"      "#DBD1B3"      "#9C656A"      "#98A59F" 
##    Burblc Simp    Stanky Bean         Thrdly 
##      "#E2B584"      "#C5A2AB"      "#BEA474"

aco_hex
##  [1] "#707055" "#CBC6A6" "#615C49" "#BFAE9C" "#78727C" "#DDC4C7" "#A9A7AB"
##  [8] "#E9BF8F" "#B18A6D" "#D8C8B9" "#3E3F43" "#B2B8C4" "#C7AC92" "#305E53"
## [15] "#AC9891" "#B1646B" "#CBCDC0" "#DBD2B3" "#A2626A" "#98A59E" "#E8B187"
## [22] "#C5A1AB" "#BFA17C"

we can see they’re really close to each other, and I doubt all but the most egregiously picky color snobs can tell the difference visually, too:

par(mfrow=c(1,2))
scales::show_col(ocr_cols)
scales::show_col(aco_hex)
par(mfrow=c(1,1))

(OK, #3D3F43 is definitely hitting my OCD as being annoyingly different than #3D3F42 on my MacBook Pro so count me in as a color snob.)

Here’s the final palette:

structure(c("#707154", "#C9C7A5", "#615D44", "#BEAE9B", "#79727D", 
"#DDC4C7", "#ABA6AA", "#E9BF8D", "#B08A6E", "#D8C8B9", "#3D3F42", 
"#B2B8C4", "#C9AC8F", "#305E53", "#AF9693", "#B0636C", "#CCCDC2", 
"#DBD1B3", "#9C656A", "#98A59F", "#E2B584", "#C5A2AB", "#BEA474"
), .Names = c("Clardic Fug", "Snowbonk", "Catbabel", "Bunfiow", 
"Ronching Blue", "Bank Butt", "Caring Tan", "Stargoon", "Sink", 
"Stummy Beige", "Dorkwood", "Flower", "Sand Dan", "Grade Bat", 
"Light Of Blast", "Grass Bat", "Sindis Poop", "Dope", "Testing", 
"Stoncr Blue", "Burblc Simp", "Stanky Bean", "Thrdly"))

This third attempt took ~5 minutes vs 60s.

FIN

Why “A-“? Well, I didn’t completely verify the colors and values matched 100% in the final submission. They are likely the same, but the best way to get something corrected by others it to put it on the internet, so there it is 🙂

I’d be a better human and coder if I took the time to learn tesseract more, but I don’t have much need for OCR’ing text. It is likely worth the time to brush up on tesseract after you read this post.

Don’t use this palette! I created it mostly to beat Karthik to making the palette (I have no idea if I succeeded), to also show that you should not forego your base R roots (I could have let that be subliminal but I wasn’t trying to socially engineer you in this post) and to bring up the speed/reproducibility topic. I see no issues with manually doing tasks (like uploading an image to a web site) in certain circumstances, but it’d be an interesting topic of debate to see just what “rules” folks use to determine how much effort one should put into 100% programmatic reproducibility.

You can find the ACO file and an earlier, alternate attempt at making the palette in this gist.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.