Site icon R-bloggers

bupaR: Business Process Analysis with R

[This article was first published on R – Research group Business Informatics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Organizations are nowadays storing huge amounts of data related to various business processes. Process mining provides different methods and techniques to analyze and improve these processes. This allows companies to gain a competitive advantage. Process mining initiated with the discovery of work-flow models from event data. However, over the past 20 years, the process mining field has evolved  into a broad and diverse research discipline.

bupaR is an open-source suite for the handling and analysis of business process data in R. It was developed by the Business Informatics research group at Hasselt University, Belgium. The central package includes basic functionality for creating event log objects in R. It contains several functions to get information about an event log and also provides specific event log versions of generic R functions. Together with the related packages, each of which has their own specific purpose, bupaR aims at supporting each step in the analysis of event data with R, from data import to online process monitoring.

The table below shows an example event log. Each row is an event which belongs to a case (a patient). Different events together can form an activity instance, or execution (e.g. event 2-4 belong to surgery 2). Each event in such an execution will have a different transactional lifecycle status. Note that there can be different instances of a specific activity (e.g. there are two surgeries in the example). Furthermore, each event has a timestamp, indicating when it happened, and a resource, indicating who performed it.

Given that the data shown above is stored in a data.frame, it can be turned into an event log object by indicating all the relevant data fields.

library(bupaR) 
data %>%
 eventlog(
   case_id = "patient",
   activity_id = "activity",
   activity_instance_id = "activity_instance",
   lifecycle_id = "status",
   timestamp = "timestamp",
   resource_id = "resource"
 )

Alternatively, event data can be read from XES-files. XES, eXtensible Event Stream notation, is the IEEE standard for storing and sharing event data. The xesreadR package, which is part of bupaR, provides the functions read_xes and write_xes as an interface between R and XES-files. The following statement shows how to read an event log from a xes-file, in this case with data on an order-to-cash (otc) process.

log_otc <- read_xes("otc.xes")

Event log objects can be visualized with processmapR. It allows the user to create a customizable dotted chart, showing all the events by time and case identifier in one graph. Precedence relations between activities can also be shown with a process map. Frequent traces, i.e. activity sequences, can be explored with the trace_explorer.

log_otc %>% 
 dotted_chart

log_otc %>%
 filter_trace_frequency(perc = 0.9) %>%
 process_map()

log_otc %>% 
  trace_explorer(coverage = 0.9)

edeaR stands for Exploratory and Descriptive Event-Data Analysis. This package provides several metric functions for in-depth analysis of event logs, as well as a diverse set of subsetting methods. The functions can be calculated at a varying number of granularity levels, allowing to drill-down in the data and focus on a specific part. Furthermore, all metrics are compatible with dplyr::group_by. The generic plot functions can be used to create predefined graphs, which can be customized using ggplot2.

The example below shows in how many cases each of the activities is present. This shows that in the given event log, there is a set of very common activities, and a set of very rare activities.

log_otc %>% activity_presence %>% plot

Next to the metrics, also a varied set of event-data specific subsetting methods are provided. All the functions are designed to work together with the piping symbol.

Next to the packages discussed above, there is also the eventdataR package which contains example event datasets and the processmonitR package which provides predefined dashboards for online process monitoring. For more information about bupaR, you can visit the website where you can also find a cheat sheet.

To leave a comment for the author, please follow the link and comment on their blog: R – Research group Business Informatics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.