Joyplot Logo

[This article was first published on R on datistics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to my data science blog datistics where I will gradually post all the vignettes and programming POC’s that I have written over the past two years. Most of them can be already found in my github repository.

I am using blogdown to create this blog and using R and RStudio. However I have recently taken up python programming for work again, so my first challenge will be to also add posts in the form of jupyter notebooks.

As for my first post I will add the code that I use to generate my page logo in R.

Tweedie distributions

We often encounter distributions that are not normal, I often encounter poisson and gamma distributions as well as distributions with an inflated zero value all of which belong to the family of tweedie distributions. When changing the parameter p which can take values between 0 and 2 ( p == 0 gaussian, p == 1 poisson, p == 2 gamma) we can sample the different tweedie distributions.

the tweedie package only supports values for 1 <= p <= 2

suppressWarnings({
  suppressPackageStartupMessages({
    require(tidyverse)
    require(tweedie)
    require(ggridges)
  })
})
df = tibble( p = seq(1,2,0.1) ) %>%
  mutate( data = map(p, function(p) rtweedie(n = 500
                                             , mu = 1
                                             , phi = 1
                                             , power = p )  ) ) %>%
  unnest(data)

df %>%
  ggplot( aes(x = data) )+
    geom_histogram(bins = 100, fill = '#77773c') +
    facet_wrap(~p, scales = 'free_y')

Joyplot

We will now transform these distributions into a joyplot in the style of the Joy Divisions album Unknown Pleasurs cover art.

We will use ggridges formerly known as ggjoy.

joyplot = function(df){

  p = df %>%
    ggplot(aes(x = data, y = as.factor(p), fill = ..x.. ) ) +
      geom_density_ridges_gradient( color = 'white'
                                   , size = 0.5
                                   , scale = 3) +
      theme( panel.background = element_rect(fill = 'white')
             , panel.grid = element_blank()
             , aspect.ratio = 1
             , axis.title = element_blank()
             , axis.text = element_blank()
             , axis.ticks = element_blank()
             , legend.position = 'none') +
     xlim(-1,5) +
     scale_fill_viridis_c(option = "inferno") 
  
  return(p)

}

joyplot(df)
## Picking joint bandwidth of 0.24

I order to distribute them a bit better over the x-axis we will transform them using a sine wave pattern.

df = tibble( p = seq(1,2,0.05)
             , rwn = row_number(p)
             , sin = sin(rwn) ) %>%
  mutate( data = map(p, function(p) rtweedie(500
                                             , mu = 1
                                             , phi = 1
                                             , power = p)  ) ) %>%
  unnest(data) %>%
  filter( data <= 4) %>%
  mutate( data = ( 4 * abs( sin(rwn) ) ) - data )


joyplot(df)
## Picking joint bandwidth of 0.206

To leave a comment for the author, please follow the link and comment on their blog: R on datistics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)