Kerning and Kerning in a Widening Gyre

R on kieranhealy.org

7 hours ago

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post summarizes an extended period of deep annoyance. I have tried to solve the problem it describes more than once before and not quite done it. This has, in fact, happened again. I have still not satisfactorily solved the problem. But this time I know why I can’t solve it in a civilized manner. My goal is simple, and reasonable. I want to produce more or less identical plots in both PNG and PDF formats. PNG is a raster format. PDF is a vector format and also the Devil Incarnate. Sometimes you want one format, sometimes the other. Raster formats color in pixels on a grid of some fixed resolution. They are efficient when you need to plot a lot of elements but you can’t zoom in on them without loss. Vector formats can be easily resized up or down without loss of fidelity, but they get big real fast when you have a lot of objects to show, because each one is drawn separately, and also they are the Devil Incarnate. Especially when it comes to s.

Dr Manhattan, the original overfull hbox.

When I make the PDF, I want the s in the PDF versions to be embedded in the file. That way, they can be addressed directly and changed later if necessary when it comes to printing or other production. If the s used in your file aren’t embedded in your PDF and the file is opened or printed on a system that doesn’t have access to the s you used, they will be replaced with one of a small number of default s that every system or printer knows. This is bad.

I said earlier (twice) that PDF is the Devil Incarnate. This is not really true. Font rendering in general is the Devil Incarnate. PDF is a Major Demon of the Font World. It is descended from greater demons. It traces its foul lineage through an immense tangle of filthy string, glue, and pins back to the earliest days of high-fidelity computer displays and printers.

I make my plots in R, with ggplot usually. (And sometimes tinyplot. It’s good. You Base R snobs can bite me; I’ve been using R since it was a different letter.) Anyway, by default, R’s PDF graphics device does not embed s, presumably on the sensible grounds that the more you reject the Devil and all his Works, the better off you are. However, over the years, many people with fallen natures have devised various ways to truck with Satan and specifically to get s properly embedded in PDFs. Think of it as a process of building one’s house on a combination of other people’s houses, piles of sand, a variety of leftover construction paper, and ultimately the giant tangle of string mentioned before.

Here’s a plot made with ggplot in R.

The output we want. Produced as a PNG directly.

This is the output I want. If all you want to do in life is produce PNGs or JPEGs of ggplot graphs then you are in luck. It works perfectly. Any typeface, any specific on your system can be used. You live in a paradise created by people like Thomas Lin Pedersen. You do not know how good you have it. You write a bit of code like this:

df <- mtcars |>
  mutate(car = rownames(mtcars)) |>
  as_tibble()

out <- df |>
  ggplot(aes(x = wt, y = mpg, label = car)) +
  geom_point() +
  geom_text_repel(family = "Myriad Pro Condensed") +
  annotate("text", x = 3.5, y = 30, label = "This is some text in Myriad Pro Condensed",
           family = "Myriad Pro Condensed", color = "darkred", size = 8) +
  annotate("text", x = 3.5, y = 29, label = "This is some text in Myriad Pro SemiCondensed",
           family = "Myriad Pro SemiCondensed", color = "cornflowerblue", size = 8) +
  labs(title = "This is the Title, in Myriad Semibold SemiCondensed",
       subtitle = "This is the Subtitle. It is in Myriad SemiCondensed",
       caption = "This is the Caption") +
  theme_myriad_semi()

ggsave("figurespost-01-png-desired.png", out, width = 8, height = 8, dpi = 300)

And you get the plot above. You are done. Please, I beg you, leave now. Go on your way. Walk outside. Read a book. Observe the Fall of the Republic at your leisure. Whatever you wish.

Not a PNG, a PDF

I want a PDF where the specific s I use—s which very definitely exist on my computer—are embedded in the PDF produced by R. They should appear just like in the PNG above. Let’s give it a shot.

1	ggsave("figurespost-01-pdf-fail-1.pdf", out, width = 8, height = 8)

This is a PNG representation of the PDF output.

Well, shit. That’s not right. “But Kieran”, you say, “Surely you are aware that ggplot can embed PDF s in PDF files in just the way that you want. Have you not read for example this helpful post by Andrew Heiss, a prince amongst men, showing you how to do it with the Cairo graphics device that comes with R and that ggplot can take advantage of?” I am of course well-aware of this. All we have to do is tell our ggsave() call to specifically use device = cairo_pdf and our problems are over. Like this:

1 2	ggsave("figurespost-01-pdf-fail-cairo.pdf", out, device = cairo_pdf, width = 8, height = 8)

This is what we get:

Again, a PNG conversion of what the PDF file looks like.

Two things are going on here. First, most of the text is clearly not in Myriad Pro. It is in Bitstream Vera Sans, one of the fallback s handed down from X11 or somewhere. Second, and this will turn out to be a hint, the colored text (the stuff geom_text_repel() controls) is in Myriad, but it’s just Myriad Pro Regular. Not the SemiCondensed variant we want.

Again, Andrew’s post is essentially correct. The cairo_pdf device argument to ggsave() will embed s in the PDF. We can for example make it do this:

out <- df |>
  ggplot(aes(x = wt, y = mpg, label = car)) +
  geom_point() +
  geom_text_repel(family = "Papyrus") +
  annotate("text", x = 3.5, y = 30, label = "This is some text in Myriad Pro Condensed",
           family = "Papyrus", color = "darkred", size = 8) +
  annotate("text", x = 3.5, y = 29, label = "This is some text in Myriad Pro SemiCondensed",
           family = "Papyrus", color = "cornflowerblue", size = 8) +
  labs(title = "This is the Title, in Myriad Semibold SemiCondensed",
       subtitle = "This is the Subtitle. It is in Myriad SemiCondensed",
       caption = "This is the Caption") +
  theme_bw(base_family = "Papyrus")

ggsave("figurespost-01-pdf-papyrus-cairo.pdf",
         out, device = cairo_pdf, width = 8, height = 8)

Oh so you’ll embed Papyrus but not Myriad is that it?

For some reason, though, R cannot see the variants of Myriad I want to embed even though it sees them when making PNG files. This, friends, is where in the past I have halted and turned away to the alternative some of you are about to recommend.

Showtext

The Showtext package solves this problem by routing around it. Instead of embedding the s we use, it inserts itself into the rendering process and converts all the glyphs — the letters — to vector outlines. It works! You will get the shapes you want in the PDFs you create, for any that you can access when making a PNG. You do it like this.

library(showtext)

myriad__dir <- system.file("s", "myriad-pro", package = "myriad")

syss::_add("Myriad Pro SemiCondensed",
                   regular = paste0(myriad__dir, "/", "MyriadPro-SemiCn.otf"),
                   bold = paste0(myriad__dir, "/", "MyriadPro-BoldSemiCn.otf"),
                   italic = paste0(myriad__dir, "/", "MyriadPro-SemiboldSemiCnIt.otf"),
                   bolditalic = paste0(myriad__dir, "/", "MyriadPro-SemiboldCondIt.otf"))

syss::_add("Myriad Pro Condensed",
                   regular = paste0(myriad__dir, "/", "MyriadPro-Cond.otf"),
                   bold = paste0(myriad__dir, "/", "MyriadPro-BoldCond.otf"),
                   italic = paste0(myriad__dir, "/", "MyriadPro-CondIt.otf"),
                   bolditalic = paste0(myriad__dir, "/", "MyriadPro-BoldCondIt.otf"))

showtext_auto()


out <- df |>
  mutate(car = rownames(mtcars)) |>
  as_tibble() |>
  ggplot(aes(x = wt, y = mpg, label = car)) +
  geom_point() +
  geom_text_repel(family = "Myriad Pro Condensed") +
  annotate("text", x = 3.5, y = 30, label = "This is some text in Myriad Pro Condensed",
           family = "Myriad Pro Condensed", color = "darkred", size = 8) +
  annotate("text", x = 3.5, y = 29, label = "This is some text in Myriad Pro SemiCondensed",
           family = "Myriad Pro SemiCondensed", color = "cornflowerblue", size = 8) +
  labs(title = "This is the Title, in Myriad Semibold SemiCondensed",
       subtitle = "This is the Subtitle. It is in Myriad SemiCondensed",
       caption = "This is the Caption") +
  theme_myriad_semi(title_family = "Myriad Pro SemiCondensed")

ggsave("figurespost-01-pdf-showtext.pdf", out, width = 8, height = 8)

This seems like what we wanted, doesn’t it? Superficially, it is. But, as with so much in life, we have paid a terrible price. First, notice how we explicitly had to add the s there using the syss package. Showtext does not see the s that Thomas Lin Pedersen’s systems package makes generally available to R. That is annoying and, I believe, fights between them have caused my RStudio session to segfault more than once. Second, and more importantly, while the PDF looks good, there are no longer any s in it. There are only outline shapes of every individual glyph. If you want to e.g. edit the PDF later in Illustrator or something, you will not be able to adjust the s as s. They are just shapes. That’s bad.

Showtext will also make it harder to create, in one go, PDFs and PNGs where text and graphic elements are both the same size. Without further futzing around, you may find yourself getting PNG output like this for the same ggsave() height and width parameters:

Back to Cairo

I. Just. Want. To. Embed. The. Fonts. In. The. PDF. File.

Eventually, I figured out what was happening, after many a dead end trying to persuade systems to register the existence of the variants—something it in fact was already doing just fine for PNG files and the display devices on screen. The problem is that while the Cairo PDF device can see and properly embed s that are installed on your system, it can only see the Regular, Bold, Italic and Bold Italic variants of named Font Families. On a Mac, for instance, you can look at FontBook and see all your s:

Some of these will just be a single . But others, like Myriad, will be an entire family of s, with many individual variants and styles. The version of Myriad I own has forty of them.

The Cairo device is great but it cannot see inside families like this. It can see the main variants, but that’s it. The only way I have found to get the cairo_pdf device to see a like Myriad Semibold SemiCondensed is to have it installed as a separately named family with appropriate regular, bold, italic, and bold italic faces named as such. Older s were installed like this more often, and some contemporary families still are. For example I have loads of variants of Input:

Input Mono, Sans, and Serif, in various Regular, Condensed, and Compressed varities.

These are all addressable by Cairo and the methods described by Andrew will work just fine for them and similar s. But this is not true of superfamilies like Myriad and others.

Unfortunately, right now the only way I know to solve this (beyond just forgetting about it and using Papyrus, I mean) is to rewrite the metadata of individual OTF or TTF files such that they can be installed as a separate , perhaps with a different name. For many s, this will break the terms of the license you bought it under. Applications like TransType and others can do this, though they are careful to tell you, as I am telling you, that this may well be against licensing terms. You could also, possibly, only buy the specific faces you want and install those.

But like, just hypothetically

If you can make the variants available as separate named faces that show up as such in your FontBook or equivalent manager, then things will work as you expect with PDFs. I mean, they will work as you desire. PDFs working as you expect means they are broken and just make your life miserable. You can write, for example,

out <- df |>
  mutate(car = rownames(mtcars)) |>
  as_tibble() |>
  ggplot(aes(x = wt, y = mpg, label = car)) +
  geom_point() +
  geom_text_repel(family = "Socviz Condensed") +
  annotate("text", x = 3.5, y = 30, label = "This is some text in Myriad Pro Condensed",
           family = "Socviz Condensed", color = "darkred", size = 8) +
  annotate("text", x = 3.5, y = 29, label = "This is some text in Myriad Pro SemiCondensed",
           family = "Socviz SemiCondensed", color = "cornflowerblue", size = 8) +
  labs(title = "This is the Title, in Myriad Semibold SemiCondensed",
       subtitle = "This is the Subtitle. It is in Myriad SemiCondensed",
       caption = "This is the Caption") +
  theme_socviz_semi()

ggsave("figurespost-01-pdf-sepface.pdf", out, device = cairo_pdf, width = 8, height = 8)

And get this:

In the PDF version the s will be properly embedded. Just like you wanted. So we’re done, right? That’s it? We finally it? We’re finished?

No of course we’re not finished

No of course we’re not finished. Did I not say unto you earlier that we have built our HOUSE on a giant TANGLE of STRING stretching back yea even unto the Middle Ages slash 1982? Look at this picture:

The Kerning is Bad, Bob. Look at that gross ‘ur’ separated from its neighbors by some vast gulf, for example.

For reasons above my pay grade, the cairo_pdf device option to ggsave() cannot kern to save its life. Now, maybe you don’t see the problem at all. Like, literally you don’t see it, in the same way that you do not see the toilet paper that has been stuck to your shoe all day, or the piece of food that’s still on your chin from this morning. You are one of those people who is happy to walk around with your shirt on inside out, or with no pants. Or perhaps your objections are more moral in nature. You see it, you insist, but you don’t care about kerning. You shrug. What is kerning, you say, in the grand scheme of things? Who can be concerned with kerning when the world is berning? I mean, burning? Well, I’m afraid I can. Because a man needs a code. Specifically a code governing aesthetically pleasing and properly flexible spacing between letters and other letters, with due regard to capitalization, ligatures, punctuation, text size, and the specific function of the glyphs being typeset.

If we use the Cario device directly, like an animal, the problem does not arise:

library(Cairo)

CairoPDF(file = "figurespost-01-pdf-cairodirect.png",
         width = 8, height = 8)
         
    print(out)

dev.off()

We have to turn the device off once we’re done with it, like it’s 1997. If you forget, you won’t notice for a while but eventually it’s like your Dad is gonna yell at you because you forgot to turn the lights off downstairs before you went to bed or you left the fridge door open after you went to get a drink of milk or you opened the window while the air conditioning is running in the house what the hell kind of child have I raised.

And so now we have at last returned to where we began. A PDF with embedded s that is comparable to the PNG we made at the beginning.

If we want to forget about this for a while we could write a convenience function and put it in our utility package so that it makes one of each kind—a PNG, a Showtext PDF, and a Cairo PDF with embedded s—every time. Something like this:

#' Use ggsave, showtext, and Cairo to make a PNG, an outline PDF, and an embedded PDF at once
#'
#' @param basename Desired filename, without extension
#' @param plot Same as ggsave
#' @param device Not used
#' @param path Same as ggsave
#' @param scale Same as ggsave
#' @param width Same as ggsave
#' @param height Same as ggsave
#' @param units Same as ggsave
#' @param dpi Same as ggsave
#' @param limitsize Same as ggsave
#' @param bg Same as ggsave
#' @param create.dir Same as ggsave
#' @param ... Other args to ggsave
#'
#' @returns A PNG, a Cairo PDF, and a showtext PDF of the plot
#' @export
#'
#' @examples \dontrun{
#' }
save_figure <- function(basename,
                        plot = last_plot(),
                        device = NULL,
                        path = NULL,
                        scale = 1,
                        width = NA,
                        height = NA,
                        units = "in",
                        dpi = 300,
                        limitsize = TRUE,
                        bg = "white",
                        create.dir = FALSE, ...) {

  require(Cairo)

  png_name <- paste0(basename, ".png")
  pdf_name <- paste0(basename, ".pdf")
  showtext_name <- paste0(basename, "_sho.pdf")


  ggplot2::ggsave(png_name, plot = plot, device = device, width = width, height = height,
                  units = units, dpi = dpi, limitsize = limitsize, bg = bg, create.dir = create.dir, ...)


  CairoPDF(file = pdf_name,
           width = width, height = height,
           title = "", s = NULL, ...)

    print(plot)

  invisible(dev.off())

  pdf(file = showtext_name,
           width = width, height = height,
           title = "", s = NULL, ...)
  showtext::showtext_auto(enable = TRUE)
  print(plot)
  showtext::showtext_auto(enable = FALSE)
  invisible(dev.off())


}

In Summary

Just don’t go down this path. Learn to love your PNGs.

To leave a comment for the author, please follow the link and comment on their blog: R on kieranhealy.org.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.