Chapter 1 Introduction
A quickly growing field that is at least data science adjacent is that of process mining (sometimes called business process analytics)
In this section I aim to cover the following items
- So what is process mining?
- What can we do with it?
- Why do we care about it?
1.1 What does the current space look like
Unsurprisedly there are both open source and commercial solutions available for assisting in the investigation and analysis of processes.
Examples of use of process mining
- analyzing treatment process in hospitals
- improving customer services process in a multinational corporation
- understanding browsing behavior of customers on a online booking site
- analyzing failures of a baggage handling system
1.1.1 Process Sciences // Process Mining // Process Models
Process mining research starting at TU/e (Eindhoven University of Technology) in 1999
In general the field is tied to the growth in big data >> automated creation of event logs
Answering questions like:
- what are the most frequent paths in my process? Do they change over time?
- what do the cases that take longer than 3 months have in common? where are the bottlenecks causing these delays?
which cases deviate from the reference process? do these deviations also cause delays?
- Generally are either performance or conformance related questions
- performance: response times, service levels
1.2 What is process mining?
1.3 What is an event log?
what is “event data” contains a couple of key elements * activity: well defined step in a process (often an event in a event log) * case: process instance * trace: the combination of activities into cases (multiple traces make up the process flow) * resource: person/device executing the activity * timestamp: crucial element of event logs
couple of useful but not necessary elements
1.4 What can we do with it?
understand complicated views visualize flow rather than just focusing on the static intersections
library(processanimateR)
library(eventdataR)
library(bupaR)
summary(patients)
## Number of events: 5442
## Number of cases: 500
## Number of traces: 7
## Number of distinct activities: 7
## Average trace length: 10.884
##
## Start eventlog: 2017-01-02 11:41:53
## End eventlog: 2018-05-05 07:16:02
## handling patient employee handling_id
## Blood test : 474 Length:5442 r1:1000 Length:5442
## Check-out : 984 Class :character r2:1000 Class :character
## Discuss Results : 990 Mode :character r3: 474 Mode :character
## MRI SCAN : 472 r4: 472
## Registration :1000 r5: 522
## Triage and Assessment:1000 r6: 990
## X-Ray : 522 r7: 984
## registration_type time .order
## complete:2721 Min. :2017-01-02 11:41:53 Min. : 1
## start :2721 1st Qu.:2017-05-06 17:15:18 1st Qu.:1361
## Median :2017-09-08 04:16:50 Median :2722
## Mean :2017-09-02 20:52:34 Mean :2722
## 3rd Qu.:2017-12-22 15:44:11 3rd Qu.:4082
## Max. :2018-05-05 07:16:02 Max. :5442
##
# animate_process(patients, mode = "relative", jitter = 10, legend = "color",
# mapping = token_aes(color = token_scale("employee",
# scale = "ordinal",
# range = RColorBrewer::brewer.pal(7, "Paired"))))
library(processanimateR)
library(tidyverse)
library(bupaR)
# Extract only the lacticacid measurements
lactic <- sepsis %>%
mutate(lacticacid = as.numeric(lacticacid)) %>%
filter_activity(c("LacticAcid")) %>%
as.data.frame() %>%
select("case" = case_id,
"time" = timestamp,
value = lacticacid) # format needs to be 'case,time,value'
# Remove the measurement events from the sepsis log
sepsisBase <- sepsis %>%
filter_activity(c("LacticAcid", "CRP", "Leucocytes", "Return ER",
"IV Liquid", "IV Antibiotics"), reverse = T) %>%
filter_trace_frequency(percentage = 0.95)
# Animate with the secondary data frame `lactic`
animate_process(sepsisBase,
mode = "relative",
duration = 300,
legend = "color",
mapping = token_aes(color = token_scale(lactic,
scale = "linear",
range = c("#fff5eb","#7f2704"))))