Site icon R-bloggers

Joyplot Logo

[This article was first published on R on datistics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to my data science blog datistics where I will gradually post all the vignettes and programming POC’s that I have written over the past two years. Most of them can be already found in my github repository.

I am using blogdown to create this blog and using R and RStudio. However I have recently taken up python programming for work again, so my first challenge will be to also add posts in the form of jupyter notebooks.

As for my first post I will add the code that I use to generate my page logo in R.

Tweedie distributions

We often encounter distributions that are not normal, I often encounter poisson and gamma distributions as well as distributions with an inflated zero value all of which belong to the family of tweedie distributions. When changing the parameter \(p\) which can take values between 0 and 2 ( p == 0 gaussian, p == 1 poisson, p == 2 gamma) we can sample the different tweedie distributions.

the tweedie package only supports values for 1 <= p <= 2

suppressWarnings({
  suppressPackageStartupMessages({
    require(tidyverse)
    require(tweedie)
    require(ggridges)
  })
})
df = tibble( p = seq(1,2,0.1) ) %>%
  mutate( data = map(p, function(p) rtweedie(n = 500
                                             , mu = 1
                                             , phi = 1
                                             , power = p )  ) ) %>%
  unnest(data)

df %>%
  ggplot( aes(x = data) )+
    geom_histogram(bins = 100, fill = '#77773c') +
    facet_wrap(~p, scales = 'free_y')

Joyplot

We will now transform these distributions into a joyplot in the style of the Joy Divisions album Unknown Pleasurs cover art.

We will use ggridges formerly known as ggjoy.

joyplot = function(df){

  p = df %>%
    ggplot(aes(x = data, y = as.factor(p), fill = ..x.. ) ) +
      geom_density_ridges_gradient( color = 'white'
                                   , size = 0.5
                                   , scale = 3) +
      theme( panel.background = element_rect(fill = 'white')
             , panel.grid = element_blank()
             , aspect.ratio = 1
             , axis.title = element_blank()
             , axis.text = element_blank()
             , axis.ticks = element_blank()
             , legend.position = 'none') +
     xlim(-1,5) +
     scale_fill_viridis_c(option = "inferno") 
  
  return(p)

}

joyplot(df)
## Picking joint bandwidth of 0.24

I order to distribute them a bit better over the x-axis we will transform them using a sine wave pattern.

df = tibble( p = seq(1,2,0.05)
             , rwn = row_number(p)
             , sin = sin(rwn) ) %>%
  mutate( data = map(p, function(p) rtweedie(500
                                             , mu = 1
                                             , phi = 1
                                             , power = p)  ) ) %>%
  unnest(data) %>%
  filter( data <= 4) %>%
  mutate( data = ( 4 * abs( sin(rwn) ) ) - data )


joyplot(df)
## Picking joint bandwidth of 0.206

To leave a comment for the author, please follow the link and comment on their blog: R on datistics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.