Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Many of my posts seem to begin with a link to a tweet, and this one falls into that pattern:
And @_inundata is already working on a #rstats palette. https://t.co/bNfpL7OmVl
— Timothée Poisot (@tpoi) May 21, 2017
I’d seen the Ars Tech post about the named color palette derived from some training data. I could tell at a glance of the resultant palette:
that it would not be ideal for visualizations (use this site test the final image in this post and verify that on your own) but this was a neat, quick project to take on, especially since it let me dust off an old GH package, adobecolor
and it was likely I could beat Karthik to creating a palette 😉
The “B+” goal is to get a color palette that “matches” the one in the Tumlbr post. The “A” goal is to get a named palette.
These are all the packages we end up using:
library(tesseract) library(magick) library(stringi) library(adobecolor) # hrbrmstr/adobecolor - may not be Windows friendly library(tidyverse)
Attempt #1 (B+!!)
I’m a macOS user, so I’ve got great tools like xScope at my disposal. I’m really handy with that app and the Loupe tool makes it easy to point at a color, save it to a palette board and export an ACO palette file.
That whole process took ~18 seconds (first try). I’m not saying that to brag. But we often get hung up on both speed and programmatic reproducibility. I ultimately — as we’ll see in a bit — really went for speed vs programmatic reproducibility.
It’s dead simple to get the palette into R:
aco_fil <- "ml_cols.aco" aco_hex <- rev(read_aco(aco_fil)) col2rgb(aco_hex) ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] ## red 112 203 97 191 120 221 169 233 177 216 62 178 199 ## green 112 198 92 174 114 196 167 191 138 200 63 184 172 ## blue 85 166 73 156 124 199 171 143 109 185 67 196 146 ## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] ## red 48 172 177 203 219 162 152 232 197 191 ## green 94 152 100 205 210 98 165 177 161 161 ## blue 83 145 107 192 179 106 158 135 171 124
IIRC there may still be a byte-order issue (PRs welcome) I need to deal with on Windows in adobecolor
but you likely will never need to use the package again.
A quick eyeball comparison between the Tumblr list and that matrix indicates the colors are off. That could be for many reasons starting from the way they were encoded in the PNG by whatever programming language was used to train the neural net and make the image (likely Python) to Tumblr degrading it to something on my end. You’ll see that the colors are close enough for humans that it’s likely close enough.
There, I’ve got a B+ with about a total of 60s of work! Plenty of time left to try shooting for an A!
Attempt #2 (FAIL)
We’ve got the PNG from the Tumblr post and the tesseract
package in R. Perhaps this will be super-quick, too:
pal_img_fil <- "tumblr_inline_opgsh0UI6N1rl9zu7_400.png" pal_ocr <- ocr(pal_img_fil) stri_split_lines(pal_ocr) ## [[1]] ## [1] "-ClaniicFug112113 84" "-Snowhnn.k 201 199165" ## [3] "- Cmbabcl 97 93 68" "-Bunfluw 190 174 155" ## [5] "-an:hing Blue 121 114125" "Bank Bun 221 196199" ## [7] "- Caring Tan 171 166170" "-Smrguun 233191 141" ## [9] "-Sink 176 131; 110" "Slummy Beige 216 200135" ## [11] "- Durkwumi 61 63 66" "Flow/£1178 1114 196" ## [13] "- Sand Dan 2111 172143" "- Grade 136: 41; 94 x3" ## [15] "-Ligh[OfBlasll75150147" "-Grass 13m 176 99108" ## [17] "Sindis Poop 204 205 194" "Dupe 219 2119179" ## [19] "-'n:sling156101 106" "-SloncrElu13152165 159" ## [21] "- Buxblc Simp 226 1x1 132" "-Sl.mky 13m197162171" ## [23] "-'J\\milyl90164116" "" ## [25] ""
Ugh.
Perhaps if we crop out the colors:
image_read(pal_img_fil) %>% image_crop("+57") %>% ocr() %>% stri_split_lines() ## [[1]] ## [1] "Clanfic Fug112113 84" "Snowhunk 201 199 165" ## [3] "Cmbabcl 97 93 as" "Bunfluwl90174155" ## [5] "Kunming Blue 121 114 125" "Bank Bun 221196199" ## [7] "Caring Tan 171 ms 170" "Slarguun 233 191 141" ## [9] "Sinkl76135110" "" ## [11] "SIIImmy Beige 216 200 135" "Durkwuud e1 63 66" ## [13] "Flower 175 154 196" "" ## [15] "Sand Dan 201 172 143" "Grade 1m AB 94: 53" ## [17] "" "Light 0mm 175 150 147" ## [19] "Grass Ba! 17a 99 ms" "sxndis Poop 204 205 194" ## [21] "Dupe 219 209 179" "" ## [23] "Tesling 156 101 106" "SloncrEluc 152 165 159" ## [25] "Buxblc Simp 226 131 132" "Sumky Bean 197 162 171" ## [27] "1\\mfly 190 164 11a" "" ## [29] ""
Ugh.
I’m woefully unfamiliar with how to use the plethora of tesseract options to try to get better performance and this is taking too much time for a toy post, so we’ll call this attempt a failure 🙁
Attempt #3 (A-!!)
I’m going to go outside of R again to New OCR and upload the Tumblr palette there and crop out the colors (it lets you do that in-browser). NOTE: Never use any free site for OCR’ing sensitive data as most are run by content thieves.
Now we’re talkin’:
ocr_cols <- "Clardic Fug 112 113 84 Snowbonk 201 199 165 Catbabel 97 93 68 Bunfiow 190 174 155 Ronching Blue 121 114 125 Bank Butt 221 196 199 Caring Tan 171 166 170 Stargoon 233 191 141 Sink 176 138 110 Stummy Beige 216 200 185 Dorkwood 61 63 66 Flower 178 184 196 Sand Dan 201 172 143 Grade Bat 48 94 83 Light Of Blast 175 150 147 Grass Bat 176 99 108 Sindis Poop 204 205 194 Dope 219 209 179 Testing 156 101 106 Stoncr Blue 152 165 159 Burblc Simp 226 181 132 Stanky Bean 197 162 171 Thrdly 190 164 116"
We can get that into a more useful form pretty quickly:
stri_match_all_regex(ocr_cols, "([[:alpha:] ]+) ([[:digit:]]+) ([[:digit:]]+) ([[:digit:]]+)") %>% print() %>% .[[1]] -> col_mat ## [[1]] ## [,1] [,2] [,3] [,4] [,5] ## [1,] "Clardic Fug 112 113 84" "Clardic Fug" "112" "113" "84" ## [2,] "Snowbonk 201 199 165" "Snowbonk" "201" "199" "165" ## [3,] "Catbabel 97 93 68" "Catbabel" "97" "93" "68" ## [4,] "Bunfiow 190 174 155" "Bunfiow" "190" "174" "155" ## [5,] "Ronching Blue 121 114 125" "Ronching Blue" "121" "114" "125" ## [6,] "Bank Butt 221 196 199" "Bank Butt" "221" "196" "199" ## [7,] "Caring Tan 171 166 170" "Caring Tan" "171" "166" "170" ## [8,] "Stargoon 233 191 141" "Stargoon" "233" "191" "141" ## [9,] "Sink 176 138 110" "Sink" "176" "138" "110" ## [10,] "Stummy Beige 216 200 185" "Stummy Beige" "216" "200" "185" ## [11,] "Dorkwood 61 63 66" "Dorkwood" "61" "63" "66" ## [12,] "Flower 178 184 196" "Flower" "178" "184" "196" ## [13,] "Sand Dan 201 172 143" "Sand Dan" "201" "172" "143" ## [14,] "Grade Bat 48 94 83" "Grade Bat" "48" "94" "83" ## [15,] "Light Of Blast 175 150 147" "Light Of Blast" "175" "150" "147" ## [16,] "Grass Bat 176 99 108" "Grass Bat" "176" "99" "108" ## [17,] "Sindis Poop 204 205 194" "Sindis Poop" "204" "205" "194" ## [18,] "Dope 219 209 179" "Dope" "219" "209" "179" ## [19,] "Testing 156 101 106" "Testing" "156" "101" "106" ## [20,] "Stoncr Blue 152 165 159" "Stoncr Blue" "152" "165" "159" ## [21,] "Burblc Simp 226 181 132" "Burblc Simp" "226" "181" "132" ## [22,] "Stanky Bean 197 162 171" "Stanky Bean" "197" "162" "171" ## [23,] "Thrdly 190 164 116" "Thrdly" "190" "164" "116"
The print()
is in the pipe as I can never remember where each stringi
functions stick lists but usually guess right, plus I wanted to check the output.
Making those into colors is super-simple:
y <- apply(col_mat[,3:5], 2, as.numeric) ocr_cols <- rgb(y[,1], y[,2], y[,3], names=col_mat[,2], maxColorValue = 255)
If we look at Attempt #1 and Attempt #2 together:
ocr_cols ## Clardic Fug Snowbonk Catbabel Bunfiow Ronching Blue ## "#707154" "#C9C7A5" "#615D44" "#BEAE9B" "#79727D" ## Bank Butt Caring Tan Stargoon Sink Stummy Beige ## "#DDC4C7" "#ABA6AA" "#E9BF8D" "#B08A6E" "#D8C8B9" ## Dorkwood Flower Sand Dan Grade Bat Light Of Blast ## "#3D3F42" "#B2B8C4" "#C9AC8F" "#305E53" "#AF9693" ## Grass Bat Sindis Poop Dope Testing Stoncr Blue ## "#B0636C" "#CCCDC2" "#DBD1B3" "#9C656A" "#98A59F" ## Burblc Simp Stanky Bean Thrdly ## "#E2B584" "#C5A2AB" "#BEA474" aco_hex ## [1] "#707055" "#CBC6A6" "#615C49" "#BFAE9C" "#78727C" "#DDC4C7" "#A9A7AB" ## [8] "#E9BF8F" "#B18A6D" "#D8C8B9" "#3E3F43" "#B2B8C4" "#C7AC92" "#305E53" ## [15] "#AC9891" "#B1646B" "#CBCDC0" "#DBD2B3" "#A2626A" "#98A59E" "#E8B187" ## [22] "#C5A1AB" "#BFA17C"
we can see they’re really close to each other, and I doubt all but the most egregiously picky color snobs can tell the difference visually, too:
par(mfrow=c(1,2)) scales::show_col(ocr_cols) scales::show_col(aco_hex) par(mfrow=c(1,1))
(OK, #3D3F43
is definitely hitting my OCD as being annoyingly different than #3D3F42
on my MacBook Pro so count me in as a color snob.)
Here’s the final palette:
structure(c("#707154", "#C9C7A5", "#615D44", "#BEAE9B", "#79727D", "#DDC4C7", "#ABA6AA", "#E9BF8D", "#B08A6E", "#D8C8B9", "#3D3F42", "#B2B8C4", "#C9AC8F", "#305E53", "#AF9693", "#B0636C", "#CCCDC2", "#DBD1B3", "#9C656A", "#98A59F", "#E2B584", "#C5A2AB", "#BEA474" ), .Names = c("Clardic Fug", "Snowbonk", "Catbabel", "Bunfiow", "Ronching Blue", "Bank Butt", "Caring Tan", "Stargoon", "Sink", "Stummy Beige", "Dorkwood", "Flower", "Sand Dan", "Grade Bat", "Light Of Blast", "Grass Bat", "Sindis Poop", "Dope", "Testing", "Stoncr Blue", "Burblc Simp", "Stanky Bean", "Thrdly"))
This third attempt took ~5 minutes vs 60s.
FIN
Why “A-“? Well, I didn’t completely verify the colors and values matched 100% in the final submission. They are likely the same, but the best way to get something corrected by others it to put it on the internet, so there it is 🙂
I’d be a better human and coder if I took the time to learn tesseract
more, but I don’t have much need for OCR’ing text. It is likely worth the time to brush up on tesseract
after you read this post.
Don’t use this palette! I created it mostly to beat Karthik to making the palette (I have no idea if I succeeded), to also show that you should not forego your base R roots (I could have let that be subliminal but I wasn’t trying to socially engineer you in this post) and to bring up the speed/reproducibility topic. I see no issues with manually doing tasks (like uploading an image to a web site) in certain circumstances, but it’d be an interesting topic of debate to see just what “rules” folks use to determine how much effort one should put into 100% programmatic reproducibility.
You can find the ACO file and an earlier, alternate attempt at making the palette in this gist.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.