Site icon R-bloggers

RObservations # 36: Opinions on RStudio’s name change. A Bayesian approach with Stan

[This article was first published on r – bensstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Recently, RStudio announced its name change to Posit. For many this name change was accepted with open arms, but for some-not so. Being the statistician that I am I decided to post a poll on LinkedIn to see the sentiment of my network. After running the poll for a week the results were in:

Most of the respondents to the poll voted that they hated the name change (31), followed by individuals who were “somewhere in between” (24). Only 11 individuals said voted that they loved the name change.

From the poll it looks like most people either hate or are ambiguous about RStudio’s name change to Posit. How credible is this?

In this short blog I’m going to explore the credibility of my results by use of Bayesian analysis with R and Stan. If you want to see my previous blog using Bayesian analysis with Stan to determine online ad efficacy see here. This analysis follows a similar approach but is fit to context.

Setting up the Simulation

In this simulation, the prior parameters are set to be uniform and the posterior distributions are binomial. The Stan code is thus:

data{
  // Total number of votes
  int n;
  // number individual votes
  // Love it
  int y1[n];
  // Hate it
  int y2[n];
  // Somewhere in between
  int y3[n];

}

parameters{
  // Prior Parameters
  real<lower=0, upper=1> theta1;
  real<lower=0, upper=1> theta2;
  real<lower=0, upper=1> theta3;
}

// Prior distribution
model{
  // All have uniform priors
  theta1 ~ beta(1,1);
  theta2 ~ beta(1,1);
  theta3 ~ beta(1,1);

  // Likelihood Functions
  y1~ bernoulli(theta1);
  y2~ bernoulli(theta2);
  y3~ bernoulli(theta3);
}

To make the data suitable for Stan, a vector representing the individual voters is created with voter choice (a binary variable for each choice) and randomly assigning the number of choices across the vector. With this, the posterior distributions are prepared to be simulated.

library(rstan)

# Setting seed for reproducibility
set.seed(1234)

# Make a vector of 0's representing the individual voters
y_love_it<-rep(0,66)
# Randomly select impressions and assign them as votes- the same number as the known data. 
y_love_it[sample(c(1:66),11)] <- 1
# Now lets do it for the rest of the data
y_hate_it<-rep(0,66)
y_hate_it[sample(c(1:66),31)]<-1

y_in_between<-rep(0,66)
y_in_between[sample(c(1:66),24)]<-1

Running the Simulation, sharing results.

After the code and the data is set up, running the simulation is very straight forward.

# Set up the data in list form
data <- list(n=66,
             y1=y_love_it,
             y2=y_hate_it,
             y3=y_in_between)

fit <- stan(file ="./RPositStan.stan", 
            data=data) 


## 
## SAMPLING FOR MODEL 'RPositStan' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.027 seconds (Warm-up)
## Chain 1:                0.026 seconds (Sampling)
## Chain 1:                0.053 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'RPositStan' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.026 seconds (Warm-up)
## Chain 2:                0.023 seconds (Sampling)
## Chain 2:                0.049 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'RPositStan' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.027 seconds (Warm-up)
## Chain 3:                0.027 seconds (Sampling)
## Chain 3:                0.054 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'RPositStan' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.027 seconds (Warm-up)
## Chain 4:                0.025 seconds (Sampling)
## Chain 4:                0.052 seconds (Total)
## Chain 4:

With the broom.mixed package we are able to tidy the output to make a neat table with the tidyMCMC() function.

library(broom.mixed)
tidyMCMC(fit)


## # A tibble: 3 x 3
##   term   estimate std.error
##   <chr>     <dbl>     <dbl>
## 1 theta1    0.173    0.0458
## 2 theta2    0.471    0.0584
## 3 theta3    0.366    0.0579

The distributions can be visualized with the code below:

library(tidyverse) 
library(reshape2)
library(ggdark)
library(ggpubr)

rstan::stan_plot(fit)+
  ggtitle("Density Of Posterior Distribution By Voter Choice")
params<- rstan::extract(fit) %>% 
         as.list() %>% 
         as_tibble() %>% 
         select(!lp__) %>% 
         transmute("Love it (theta1)"= theta1,
                   "Hate it (theta2)" = theta2,
                   "Somewhere in between (theta3)"= theta3) %>% 
        melt() %>% 
        mutate(value=c(value))

ggplot(data=params, 
       mapping=aes(x=value,fill=variable,color=variable))+
  geom_density(alpha=0.8)+
  dark_theme_gray(base_size = 14) + 
  theme(plot.background = element_rect(fill = "grey10"),
        panel.background = element_blank(),
        panel.grid.major = element_line(color = "grey30", size = 0.2),
        panel.grid.minor = element_line(color = "grey30", size = 0.2),
        legend.background = element_blank(),
        axis.ticks = element_blank(),
        legend.key = element_blank(),
        legend.position = "bottom",
        legend.title=element_blank())+
  scale_fill_manual(values = c("#FDE725",
                               "#22908C",
                               "#450D54"))+
   scale_color_manual(values = c("#FDE725",
                               "#22908C",
                               "#450D54"))+
  ggtitle("Thoughts On RStudio's Name Change To Posit \n(Posterior Distributions)")

Looking at the densities, the uncertainty between individuals in my network who hate RStudio’s name change and those who are somewhere in between is quite large. It could be that there are more people who are ambiguous than who hate it, however according to the calculations here it seems like the number of people in my network who love the name change are fewer in number. I wonder if that speaks for the larger community of R users as well?

What do you think? Do you agree with this analysis? Do you what are your thoughts on RStudio’s name change? Let me know in the comments!

Thank you for reading!

Want to see more of my content?

Be sure to subscribe and never miss an update!

To leave a comment for the author, please follow the link and comment on their blog: r – bensstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.