Going to Plot Some Proportions? Why not Flog ’em First?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Fractions and proportions can be difficult to plot nicely for a number of reasons:
- If the proportions are based on small counts (e.g., two of his three computing devices were Apple products) then the calculated proportions will only take on a number of discrete values.
- Depending on what you have measured there might be many proportions close to the 0 % and 100 % edges of the scale (Five of his five computing devices were non-Apple products).
- There is no difference made between a proportion that is the result of few counts (33 %, one out of three, were Apple products) and large counts (33 %, three out of nine, were Apple products).
Especially irritating is that the first two of these problems results in overplotting that makes it hard to see what is going on in the data. So how to plot proportions without running into these problems? John Tukey to the rescue!
In his famous Exploratory Data Analysis Tukey explains a transform of proportions he calles folded logs, or flogs for short, designed to alleviate the problems with plotting proportions described above. And since I found the flog transform really neat (but didn’t find any good description online) I though I would describe it here!
First lets look at some (unfortunately made up) data that exhibits all the problems outlined above. Say that you are hired by Apple’s marketing department to investigate whether a person’s income influences the proportion of Apple produced computing devices (phones, computers, etc) that person has. So you ask a number of persons how many computing devices they have, how many of them are Apple devices and how much they earn. The resulting data could look something like this:
head(d)