Chapter 1 Introduction

A quickly growing field that is at least data science adjacent is that of process mining (sometimes called business process analytics)

In this section I aim to cover the following items

  • So what is process mining?
  • What can we do with it?
  • Why do we care about it?

1.1 What does the current space look like

Unsurprisedly there are both open source and commercial solutions available for assisting in the investigation and analysis of processes.

Examples of use of process mining

  • analyzing treatment process in hospitals
  • improving customer services process in a multinational corporation
  • understanding browsing behavior of customers on a online booking site
  • analyzing failures of a baggage handling system

1.1.1 Process Sciences // Process Mining // Process Models

Process mining research starting at TU/e (Eindhoven University of Technology) in 1999

In general the field is tied to the growth in big data >> automated creation of event logs

Answering questions like:

  • what are the most frequent paths in my process? Do they change over time?
  • what do the cases that take longer than 3 months have in common? where are the bottlenecks causing these delays?
  • which cases deviate from the reference process? do these deviations also cause delays?

  • Generally are either performance or conformance related questions
    • performance: response times, service levels

1.2 What is process mining?

1.3 What is an event log?

what is “event data” contains a couple of key elements * activity: well defined step in a process (often an event in a event log) * case: process instance * trace: the combination of activities into cases (multiple traces make up the process flow) * resource: person/device executing the activity * timestamp: crucial element of event logs

couple of useful but not necessary elements

1.4 What can we do with it?

understand complicated views visualize flow rather than just focusing on the static intersections

library(processanimateR)
library(eventdataR)
library(bupaR)

summary(patients)
## Number of events:  5442
## Number of cases:  500
## Number of traces:  7
## Number of distinct activities:  7
## Average trace length:  10.884
## 
## Start eventlog:  2017-01-02 11:41:53
## End eventlog:  2018-05-05 07:16:02
##                   handling      patient          employee  handling_id       
##  Blood test           : 474   Length:5442        r1:1000   Length:5442       
##  Check-out            : 984   Class :character   r2:1000   Class :character  
##  Discuss Results      : 990   Mode  :character   r3: 474   Mode  :character  
##  MRI SCAN             : 472                      r4: 472                     
##  Registration         :1000                      r5: 522                     
##  Triage and Assessment:1000                      r6: 990                     
##  X-Ray                : 522                      r7: 984                     
##  registration_type      time                         .order    
##  complete:2721     Min.   :2017-01-02 11:41:53   Min.   :   1  
##  start   :2721     1st Qu.:2017-05-06 17:15:18   1st Qu.:1361  
##                    Median :2017-09-08 04:16:50   Median :2722  
##                    Mean   :2017-09-02 20:52:34   Mean   :2722  
##                    3rd Qu.:2017-12-22 15:44:11   3rd Qu.:4082  
##                    Max.   :2018-05-05 07:16:02   Max.   :5442  
## 
# animate_process(patients, mode = "relative", jitter = 10, legend = "color",
#   mapping = token_aes(color = token_scale("employee", 
#     scale = "ordinal", 
#     range = RColorBrewer::brewer.pal(7, "Paired"))))
library(processanimateR)
library(tidyverse)
library(bupaR)
# Extract only the lacticacid measurements
lactic <- sepsis %>%
    mutate(lacticacid = as.numeric(lacticacid)) %>%
    filter_activity(c("LacticAcid")) %>%
    as.data.frame() %>%
    select("case" = case_id, 
            "time" =  timestamp, 
            value = lacticacid) # format needs to be 'case,time,value'

# Remove the measurement events from the sepsis log
sepsisBase <- sepsis %>%
    filter_activity(c("LacticAcid", "CRP", "Leucocytes", "Return ER",
                      "IV Liquid", "IV Antibiotics"), reverse = T) %>%
    filter_trace_frequency(percentage = 0.95)

# Animate with the secondary data frame `lactic`
animate_process(sepsisBase, 
                mode = "relative", 
                duration = 300,
                legend = "color", 
                mapping = token_aes(color = token_scale(lactic, 
                                                        scale = "linear", 
                                                        range = c("#fff5eb","#7f2704"))))

1.5 Why should we care