S-shaped data: Smoothing with quasibinomial distribution

Posted on January 16, 2016 by msuzen in R bloggers | 0 Comments

[This article was first published on Memo's Island, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Figure 1: Synthetic data and fitted curves.

S-shaped distributed data can be found in many applications. Such data can be approximated with logistic distribution function [1]. Cumulative distribution function of logistic distribution function is a logistic function, i.e., logit.

To demonstrate this, in this short example, after generating a synthetic data, we will fit quasibinomial regression model to different observations.

ggplot [2], an implementation of grammar of graphics [3], provides capability to apply regression or customised smoothing onto a raw data during plotting.

Generating Synthetic Data

Let generate set of $n$ observation over time $t$, denoted, $X_{1}, X_{2}, …, X_{n}$ for $k$ observation $X=(x_{1}, x_{2}, …, x_{k})$. We will use cumulative function for logistic distribution [4],
$$F(x;mu,s) = frac{1}{2} + frac{1}{2} tanh((x-mu)/2s)$$, adding some random noise to make
it realistic.

Let’s say there are $k=6$ observations with the following parameter sets, $mu = {9,2,3,5,7,5}$ and $s={2,2,4,3,4,2}$, we will utilise `mapply` [5] in generating a syntetic data frame.

generate_logit_cdf <- function(mu, s, 
                               sigma_y=0.1, 
                               x=seq(-5,20,0.1)) {
  x_ms <- (x-mu)/s
  y    <- 0.5 + 0.5 * tanh(x_ms)  
  y    <- abs(y + rnorm(length(x), 0, sigma_y))
  ix   <- which(y>=1.0)
  if(length(ix)>=1) { 
    y[ix] <- 1.0
  }
  return(y)
}
set.seed(424242)
x      <- seq(-5,20,0.025) # 1001 observation
mu_vec <- c(1,2,3,5,7,8)   # 6 variables
s_vec  <- c(2,2,4,3,4,2)
# Syntetic variables
observations_df<- mapply(generate_logit_cdf, 
                              mu_vec, 
                              s_vec, 
                              MoreArgs = list(x=x))
# Give them names
colnames(observations_df) <- c("Var1", "Var2", "Var3", "Var4", "Var5", "Var6")
head(observations_df)

Smoothing of observations

Using the syntetic data we have generated, `observations_df`,
we can noq use `ggplot` and `quasibinomial` `glm` to visualise
and smooth the variables.

library(ggplot2)
 library(reshape2)
 df_all <- reshape2:::melt(observations_df)
 colnames(df_all) <- c("x", "observation", "y")
 df_all$observation <- as.factor(df_all$observation)
 p1<-ggplot(df_all, aes(x=x, y=y, colour=observation)) + geom_point() + 
        scale_color_brewer(palette = "Reds") +
        theme(
              panel.background = element_blank(), 
              axis.text.x      = element_text(face="bold", color="#000000", size=11),
              axis.text.y      = element_text(face="bold", color="#000000", size=11),
              axis.title.x     = element_text(face="bold", color="#000000", size=11),
              axis.title.y     = element_text(face="bold", color="#000000", size=11)
#              legend.position = "none"
              )
 l1<-ggplot(df_all, aes(x=x, y=y, colour=observation)) +
        geom_point(size=3) + scale_color_brewer(palette = "Reds") + 
        scale_color_brewer(palette = "Reds") + 
        #geom_smooth(method="loess", se = FALSE, size=1.5) +
        geom_smooth(aes(group=observation),method="glm", family=quasibinomial(), formula="y~x",
                       se = FALSE, size=1.5) +
        xlab("x") + 
        ylab("y") +
        #scale_y_continuous(breaks=seq(0.0,1,0.1)) +
        #scale_x_continuous(breaks=seq(0.0,230,20)) +
        #ggtitle("")  + 
        theme(
              panel.background = element_blank(), 
              axis.text.x      = element_text(face="bold", color="#000000", size=11),
              axis.text.y      = element_text(face="bold", color="#000000", size=11),
              axis.title.x     = element_text(face="bold", color="#000000", size=11),
              axis.title.y     = element_text(face="bold", color="#000000", size=11)
              )
 library(gridExtra)
 gridExtra:::grid.arrange(p1,l1)

References
[1] https://en.wikipedia.org/wiki/Logistic_distribution#Applications
[2] http://www.ggplot.org
[3] The Grammar of Graphics, L. Wilkinson, http://www.amzn.com/038724544
[4] http://en.wikipedia.org/wiki/Logistic_distribution#Cumulative_distribution_function.
[5] https://stat.ethz.ch/R-manual/R-devel/library/base/html/mapply.html

S-shaped data: Smoothing with quasibinomial distribution

Figure 1: Synthetic data and fitted curves. S-shaped distributed data can be found in many applications. Such data can be approximated with logistic distribution function [1]. Cumulative distribution function of logistic distribution function is a logistic function, i.e., logit.To demonstrate this, in this short example, after generating a synthetic data,…

January 16, 2016

In "R bloggers"

Predicting Dichotomous Outcomes I

We are trying to predict a dependent dichotomous variable (male/female, yes/no, like/dislike,etc) with independent “predictor” variables. Let’s say we want to determine whether or not an employee will quit based on the percentage of their tenure spent traveling. We assemble the data from HR and erroneously employ simple linear regression…

April 14, 2013

In "R bloggers"

How logistic regression work ?

Discussing with a non statistician colleague, it seems that the logistic regression is not intuitive; Some basics questions like : - Why don't use the linear model? - What's logistic function? - How can we compute by hand, step by step t...

May 31, 2013

In "R bloggers"

To leave a comment for the author, please follow the link and comment on their blog: Memo's Island.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

S-shaped data: Smoothing with quasibinomial distribution

Related

S-shaped data: Smoothing with quasibinomial distribution

Predicting Dichotomous Outcomes I

How logistic regression work ?

Related

S-shaped data: Smoothing with quasibinomial distribution

Predicting Dichotomous Outcomes I

How logistic regression work ?

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)