Interactive association rules exploration app

[This article was first published on Andrew Brooks - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a previous post, I wrote about what I use association rules for and mentioned a Shiny application I developed to explore and visualize rules. This post is about that app. The app is mainly a wrapper around the arules and arulesViz packages developed by Michael Hahsler.

Features

  • train association rules
    • interactively adjust confidence and support parameters
    • sort rules
    • sample just top rules to prevent crashes
    • post process rules by subsetting LHS or RHS to just variables/items of interest
    • suite of interest measures
  • visualize association rules
    • grouped plot, matrix plot, graph, scatterplot, parallel coordinates, item frequency
  • export association rules to CSV

How to get

Option 1: Copy the code below from the arules_app.R gist

Option2: Source gist directly.

library('devtools')
library('shiny')
library('arules')
library('arulesViz')
source_gist(id='706a28f832a33e90283b')

Option 3: Download the Rsenal package (my personal R package with a hodgepodge of data science tools) and use the arulesApp function:

library('devtools')
install_github('brooksandrew/Rsenal')
library('Rsenal')
?Rsenal::arulesApp

How to use

arulesApp is intended to be called from the R console for interactive and exploratory use. It calls shinyApp which spins up a Shiny app without the overhead of having to worry about placing server.R and ui.R. Calling a Shiny app with a function also has the benefit of smooth passing of parameters and data objects as arguments. More on shinyApp here.

arulesApp is currently highly exploratory (and highly unoptimized). Therefore it works best for quickly iterating on rule training and visualization with low-medium sized datasets. Check out Michael Hahsler’s arulesViz paper for a thorough description of how to interpret the visualizations. There is a particularly useful table on page 24 which compares and summarizes the visualization techniques.

Simply call arulesApp from the console with a data.frame or transaction set for which rules will be mined from:

library('arules') contains Adult and AdultUCI datasets

data('Adult') # transaction set
arulesApp(Adult, vars=40)

data('AdultUCI') # data.frame
arulesApp(AdultUCI)

Here are the arguments:

  • dataset data.frame, this is the dataset that association rules will be mined from. Each row is treated as a transaction. Seems to work OK when a the S4 transactions class from arules is used, however this is not thoroughly tested.
  • bin logical, TRUE will automatically discretize/bin numerical data into categorical features that can be used for association analysis.
  • vars integer, how many variables to include in initial rule mining
  • supp numeric, the support parameter for initializing visualization. Useful when it is known that a high support is needed to not crash computationally.
  • conf numeric, the confidence parameter for initializing visualization. Similarly useful when it is known that a high confidence is needed to not crash computationally.

Screenshots

Association rules list view

Screen Shot 2015-11-29 at 11.36.54 AM

Screen Shot 2015-11-29 at 11.34.51 AM

Scatterplot

Screen Shot 2015-11-29 at 11.40.47 AM

Graph

Screen Shot 2015-11-29 at 11.39.53 AM

Grouped Plot

Screen Shot 2015-11-29 at 11.40.58 AM

Parallel Coordinates

Screen Shot 2015-11-29 at 11.37.30 AM

Matrix

Screen Shot 2015-11-29 at 11.40.39 AM

Item frequency

Screen Shot 2015-11-29 at 11.36.19 AM

Code

#' @title Assocation Rules Visualization Shiny App
#' @description Launches a Shiny App that provides an interactive interface to the visualizations of the \code{arulesViz} package.
#' The app allows users to mine rules based on all or just subsets of features, sort by criteria (lift, support, confidence) and visualize
#' using network graph, grouped bubble and scatter plots. \cr
#' Users filter rules to target only those with a certain variable on the RHS or LHS of the rule.
#' Rule mining is computed using the \link{apriori} algorithm from \code{arules}.
#'
#' @param dataset data.frame, this is the dataset that association rules will be mined from. Each row is treated as a transaction. Seems to work
#' OK when a the S4 transactions class from \code{arules} is used, however this is not thoroughly tested.
#' @param bin logical, \code{TRUE} will automatically discretize/bin numerical data into categorical features that can be used for association analysis.
#' @param vars integer, how many variables to include in initial rule mining
#' @param supp numeric, the support parameter for initializing visualization. Useful when it is known that a high support is needed to not crash computationally.
#' @param conf numeric, the confidence parameter for initializing visualization. Similarly useful when it is known that a high confidence is needed to not crash computationally.
#' @seealso \code{arulesViz}, \code{arules}
#' @return Shiny App
#' @import shiny arulesViz arules
#' @export
#'
#' @examples
#' ## creating some data
#' n <- 10000 # of obs
#' d <- data.frame(
#' eye = sample(c('brown', 'green', 'blue', 'hazel'), n, replace=T),
#' gender = sample(c('male', 'female'), n, replace=T),
#' height = sort(sample(c('dwarf', 'short', 'average', 'above average', 'giant'), n, replace=T)),
#' wealth = sort(sample(c('poor', 'struggling', 'middle', 'uppermiddle', 'comfortable', 'rich', '1%', 'millionaire', 'billionaire'), n, replace=T)),
#' favoriteAnimal = sample(c('dog', 'cat', 'bat', 'frog', 'lion', 'cheetah', 'lion', 'walrus', 'squirrel'), n, replace=T),
#' numkids = abs(round(rnorm(n, 2, 1)))
#' )
#'
#' ## adding some pattern
#' d$numkids[d$gender=='male'] <- d$numkids[d$gender=='male'] + sample(0:3, sum(d$gender=='male'), replace=T)
#' d$numkids <- factor(d$numkids)
#'
#' ## calling Shiny App to visualize association rules
#' arulesApp(d)
# dependencies:
devtools::source_url('https://raw.githubusercontent.com/brooksandrew/Rsenal/master/R/rules2df.R')
devtools::source_url('https://raw.githubusercontent.com/brooksandrew/Rsenal/master/R/bin.R')
arulesApp <- function (dataset, bin=T, vars=5, supp=0.1, conf=0.5) {
## binning numeric data
for(i in 1:ncol(dataset)) {
if(class(dataset[,i]) %in% c('numeric', 'integer')) dataset[,i] <- Rsenal::depthbin(dataset[,i], nbins=10)
}
## calling Shiny App
shinyApp(ui = shinyUI(pageWithSidebar(
headerPanel("Association Rules"),
sidebarPanel(
conditionalPanel(
condition = "input.samp=='Sample'",
numericInput("nrule", 'Number of Rules', 5), br()
),
conditionalPanel(
condition = "input.mytab=='graph'",
radioButtons('graphType', label='Graph Type', choices=c('itemsets','items'), inline=T), br()
),
conditionalPanel(
condition = "input.lhsv=='Subset'",
uiOutput("choose_lhs"), br()
),
conditionalPanel(
condition = "input.rhsv=='Subset'",
uiOutput("choose_rhs"), br()
),
conditionalPanel(
condition = "input.mytab=='grouped'",
sliderInput('k', label='Choose # of rule clusters', min=1, max=150, step=1, value=15), br()
),
conditionalPanel(
condition = "input.mytab %in%' c('grouped', 'graph', 'table', 'datatable', 'scatter', 'paracoord', 'matrix', 'itemFreq')",
radioButtons('samp', label='Sample', choices=c('All Rules', 'Sample'), inline=T), br(),
uiOutput("choose_columns"), br(),
sliderInput("supp", "Support:", min = 0, max = 1, value = supp , step = 1/10000), br(),
sliderInput("conf", "Confidence:", min = 0, max = 1, value = conf , step = 1/10000), br(),
selectInput('sort', label='Sorting Criteria:', choices = c('lift', 'confidence', 'support')), br(), br(),
numericInput("minL", "Min. items per set:", 2), br(),
numericInput("maxL", "Max. items per set::", 3), br(),
radioButtons('lhsv', label='LHS variables', choices=c('All', 'Subset')), br(),
radioButtons('rhsv', label='RHS variables', choices=c('All', 'Subset')), br(),
downloadButton('downloadData', 'Download Rules as CSV')
)
),
mainPanel(
tabsetPanel(id='mytab',
tabPanel('Grouped', value='grouped', plotOutput("groupedPlot", width='100%', height='100%')),
tabPanel('Graph', value='graph', plotOutput("graphPlot", width='100%', height='100%')),
tabPanel('Scatter', value='scatter', plotOutput("scatterPlot", width='100%', height='100%')),
tabPanel('Parallel Coordinates', value='paracoord', plotOutput("paracoordPlot", width='100%', height='100%')),
tabPanel('Matrix', value='matrix', plotOutput("matrixPlot", width='100%', height='100%')),
tabPanel('ItemFreq', value='itemFreq', plotOutput("itemFreqPlot", width='100%', height='100%')),
tabPanel('Table', value='table', verbatimTextOutput("rulesTable")),
tabPanel('Data Table', value='datatable', dataTableOutput("rulesDataTable"))
)
)
)),
server = function(input, output) {
output$choose_columns <- renderUI({
checkboxGroupInput("cols", "Choose variables:",
choices = colnames(dataset),
selected = colnames(dataset)[1:vars])
})
output$choose_lhs <- renderUI({
checkboxGroupInput("colsLHS", "Choose LHS variables:",
choices = input$cols,
selected = input$cols[1])
})
output$choose_rhs <- renderUI({
checkboxGroupInput("colsRHS", "Choose RHS variables:",
choices = input$cols,
selected = input$cols[1])
})
## Extracting and Defining arules
rules <- reactive({
tr <- as(dataset[,input$cols], 'transactions')
arAll <- apriori(tr, parameter=list(support=input$supp, confidence=input$conf, minlen=input$minL, maxlen=input$maxL))
if(input$rhsv=='Subset' & input$lhsv!='Subset'){
varsR <- character()
for(i in 1:length(input$colsRHS)){
tmp <- with(dataset, paste(input$colsRHS[i], '=', levels(as.factor(get(input$colsRHS[i]))), sep=''))
varsR <- c(varsR, tmp)
}
ar <- subset(arAll, subset=rhs %in% varsR)
} else if(input$lhsv=='Subset' & input$rhsv!='Subset') {
varsL <- character()
for(i in 1:length(input$colsLHS)){
tmp <- with(dataset, paste(input$colsLHS[i], '=', levels(as.factor(get(input$colsLHS[i]))), sep=''))
varsL <- c(varsL, tmp)
}
ar <- subset(arAll, subset=lhs %in% varsL)
} else if(input$lhsv=='Subset' & input$rhsv=='Subset') {
varsL <- character()
for(i in 1:length(input$colsLHS)){
tmp <- with(dataset, paste(input$colsLHS[i], '=', levels(as.factor(get(input$colsLHS[i]))), sep=''))
varsL <- c(varsL, tmp)
}
varsR <- character()
for(i in 1:length(input$colsRHS)){
tmp <- with(dataset, paste(input$colsRHS[i], '=', levels(as.factor(get(input$colsRHS[i]))), sep=''))
varsR <- c(varsR, tmp)
}
ar <- subset(arAll, subset=lhs %in% varsL & rhs %in% varsR)
} else {
ar <- arAll
}
quality(ar)$conviction <- interestMeasure(ar, method='conviction', transactions=tr)
quality(ar)$hyperConfidence <- interestMeasure(ar, method='hyperConfidence', transactions=tr)
quality(ar)$cosine <- interestMeasure(ar, method='cosine', transactions=tr)
quality(ar)$chiSquare <- interestMeasure(ar, method='chiSquare', transactions=tr)
quality(ar)$coverage <- interestMeasure(ar, method='coverage', transactions=tr)
quality(ar)$doc <- interestMeasure(ar, method='doc', transactions=tr)
quality(ar)$gini <- interestMeasure(ar, method='gini', transactions=tr)
quality(ar)$hyperLift <- interestMeasure(ar, method='hyperLift', transactions=tr)
ar
})
# Rule length
nR <- reactive({
nRule <- ifelse(input$samp == 'All Rules', length(rules()), input$nrule)
})
## Grouped Plot #########################
output$groupedPlot <- renderPlot({
ar <- rules()
plot(sort(ar, by=input$sort)[1:nR()], method='grouped', control=list(k=input$k))
}, height=800, width=800)
## Graph Plot ##########################
output$graphPlot <- renderPlot({
ar <- rules()
plot(sort(ar, by=input$sort)[1:nR()], method='graph', control=list(type=input$graphType))
}, height=800, width=800)
## Scatter Plot ##########################
output$scatterPlot <- renderPlot({
ar <- rules()
plot(sort(ar, by=input$sort)[1:nR()], method='scatterplot')
}, height=800, width=800)
## Parallel Coordinates Plot ###################
output$paracoordPlot <- renderPlot({
ar <- rules()
plot(sort(ar, by=input$sort)[1:nR()], method='paracoord')
}, height=800, width=800)
## Matrix Plot ###################
output$matrixPlot <- renderPlot({
ar <- rules()
plot(sort(ar, by=input$sort)[1:nR()], method='matrix', control=list(reorder=T))
}, height=800, width=800)
## Item Frequency Plot ##########################
output$itemFreqPlot <- renderPlot({
trans <- as(dataset[,input$cols], 'transactions')
itemFrequencyPlot(trans)
}, height=800, width=800)
## Rules Data Table ##########################
output$rulesDataTable <- renderDataTable({
ar <- rules()
rulesdt <- rules2df(ar)
rulesdt
})
## Rules Printed ########################
output$rulesTable <- renderPrint({
#hack to disply results... make sure this match line above!!
#ar <- apriori(dataset[,input$cols], parameter=list(support=input$supp, confidence=input$conf, minlen=input$minL, maxlen=input$maxL))
ar <- rules()
inspect(sort(ar, by=input$sort))
})
## Download data to csv ########################
output$downloadData <- downloadHandler(
filename = 'arules_data.csv',
content = function(file) {
write.csv(rules2df(rules()), file)
}
)
}
)
}
view raw shiny_arules.R hosted with ❤ by GitHub

To leave a comment for the author, please follow the link and comment on their blog: Andrew Brooks - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)