Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Recently, RStudio announced its name change to Posit. For many this name change was accepted with open arms, but for some-not so. Being the statistician that I am I decided to post a poll on LinkedIn to see the sentiment of my network. After running the poll for a week the results were in:
Most of the respondents to the poll voted that they hated the name change (31), followed by individuals who were “somewhere in between” (24). Only 11 individuals said voted that they loved the name change.
From the poll it looks like most people either hate or are ambiguous about RStudio’s name change to Posit. How credible is this?
In this short blog I’m going to explore the credibility of my results by use of Bayesian analysis with R and Stan. If you want to see my previous blog using Bayesian analysis with Stan to determine online ad efficacy see here. This analysis follows a similar approach but is fit to context.
Setting up the Simulation
In this simulation, the prior parameters are set to be uniform and the posterior distributions are binomial. The Stan code is thus:
data{ // Total number of votes int n; // number individual votes // Love it int y1[n]; // Hate it int y2[n]; // Somewhere in between int y3[n]; } parameters{ // Prior Parameters real<lower=0, upper=1> theta1; real<lower=0, upper=1> theta2; real<lower=0, upper=1> theta3; } // Prior distribution model{ // All have uniform priors theta1 ~ beta(1,1); theta2 ~ beta(1,1); theta3 ~ beta(1,1); // Likelihood Functions y1~ bernoulli(theta1); y2~ bernoulli(theta2); y3~ bernoulli(theta3); }
To make the data suitable for Stan, a vector representing the individual voters is created with voter choice (a binary variable for each choice) and randomly assigning the number of choices across the vector. With this, the posterior distributions are prepared to be simulated.
library(rstan) # Setting seed for reproducibility set.seed(1234) # Make a vector of 0's representing the individual voters y_love_it<-rep(0,66) # Randomly select impressions and assign them as votes- the same number as the known data. y_love_it[sample(c(1:66),11)] <- 1 # Now lets do it for the rest of the data y_hate_it<-rep(0,66) y_hate_it[sample(c(1:66),31)]<-1 y_in_between<-rep(0,66) y_in_between[sample(c(1:66),24)]<-1
Running the Simulation, sharing results.
After the code and the data is set up, running the simulation is very straight forward.
# Set up the data in list form data <- list(n=66, y1=y_love_it, y2=y_hate_it, y3=y_in_between) fit <- stan(file ="./RPositStan.stan", data=data) ## ## SAMPLING FOR MODEL 'RPositStan' NOW (CHAIN 1). ## Chain 1: ## Chain 1: Gradient evaluation took 0 seconds ## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. ## Chain 1: Adjust your expectations accordingly! ## Chain 1: ## Chain 1: ## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 1: ## Chain 1: Elapsed Time: 0.027 seconds (Warm-up) ## Chain 1: 0.026 seconds (Sampling) ## Chain 1: 0.053 seconds (Total) ## Chain 1: ## ## SAMPLING FOR MODEL 'RPositStan' NOW (CHAIN 2). ## Chain 2: ## Chain 2: Gradient evaluation took 0 seconds ## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. ## Chain 2: Adjust your expectations accordingly! ## Chain 2: ## Chain 2: ## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 2: ## Chain 2: Elapsed Time: 0.026 seconds (Warm-up) ## Chain 2: 0.023 seconds (Sampling) ## Chain 2: 0.049 seconds (Total) ## Chain 2: ## ## SAMPLING FOR MODEL 'RPositStan' NOW (CHAIN 3). ## Chain 3: ## Chain 3: Gradient evaluation took 0 seconds ## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. ## Chain 3: Adjust your expectations accordingly! ## Chain 3: ## Chain 3: ## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 3: ## Chain 3: Elapsed Time: 0.027 seconds (Warm-up) ## Chain 3: 0.027 seconds (Sampling) ## Chain 3: 0.054 seconds (Total) ## Chain 3: ## ## SAMPLING FOR MODEL 'RPositStan' NOW (CHAIN 4). ## Chain 4: ## Chain 4: Gradient evaluation took 0 seconds ## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds. ## Chain 4: Adjust your expectations accordingly! ## Chain 4: ## Chain 4: ## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup) ## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup) ## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup) ## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup) ## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup) ## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup) ## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling) ## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling) ## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling) ## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling) ## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling) ## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling) ## Chain 4: ## Chain 4: Elapsed Time: 0.027 seconds (Warm-up) ## Chain 4: 0.025 seconds (Sampling) ## Chain 4: 0.052 seconds (Total) ## Chain 4:
With the broom.mixed
package we are able to tidy the output to make a neat table with the tidyMCMC()
function.
library(broom.mixed) tidyMCMC(fit) ## # A tibble: 3 x 3 ## term estimate std.error ## <chr> <dbl> <dbl> ## 1 theta1 0.173 0.0458 ## 2 theta2 0.471 0.0584 ## 3 theta3 0.366 0.0579
The distributions can be visualized with the code below:
library(tidyverse) library(reshape2) library(ggdark) library(ggpubr) rstan::stan_plot(fit)+ ggtitle("Density Of Posterior Distribution By Voter Choice")
params<- rstan::extract(fit) %>% as.list() %>% as_tibble() %>% select(!lp__) %>% transmute("Love it (theta1)"= theta1, "Hate it (theta2)" = theta2, "Somewhere in between (theta3)"= theta3) %>% melt() %>% mutate(value=c(value)) ggplot(data=params, mapping=aes(x=value,fill=variable,color=variable))+ geom_density(alpha=0.8)+ dark_theme_gray(base_size = 14) + theme(plot.background = element_rect(fill = "grey10"), panel.background = element_blank(), panel.grid.major = element_line(color = "grey30", size = 0.2), panel.grid.minor = element_line(color = "grey30", size = 0.2), legend.background = element_blank(), axis.ticks = element_blank(), legend.key = element_blank(), legend.position = "bottom", legend.title=element_blank())+ scale_fill_manual(values = c("#FDE725", "#22908C", "#450D54"))+ scale_color_manual(values = c("#FDE725", "#22908C", "#450D54"))+ ggtitle("Thoughts On RStudio's Name Change To Posit \n(Posterior Distributions)")
Looking at the densities, the uncertainty between individuals in my network who hate RStudio’s name change and those who are somewhere in between is quite large. It could be that there are more people who are ambiguous than who hate it, however according to the calculations here it seems like the number of people in my network who love the name change are fewer in number. I wonder if that speaks for the larger community of R users as well?
What do you think? Do you agree with this analysis? Do you what are your thoughts on RStudio’s name change? Let me know in the comments!
Thank you for reading!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.