Interview mit Prof. Dr. Wil van der Aalst, Eindhoven University of Technology
Prof. Dr. Wil van der Aalst, it is a great pleasure and honour that you will speak at the Predictive Analytics World Manufacturing in February in Düsseldorf. The title of your keynote will be “Process Mining based on the Internet of Events”. All data scientists know data mining and most know text mining, but what is process mining?
Process mining bridges the gap between traditional model-based process analysis (e.g., simulation and other business process management techniques) and data-centric analysis techniques such as machine learning and data mining. Process mining seeks the confrontation between event data (i.e., observed behavior) and process models (hand-made or discovered automatically). This technology has become available only recently, but it can be applied to any type of operational processes (organizations and systems). Like process mining, data mining is data-driven. However, unlike process mining, mainstream data mining techniques are typically not process-centric. Process models expressed in terms of Petri nets or BPMN diagrams cannot be discovered or analyzed in any way by the main data mining tools.
There are three main types of process mining:
The first type of process mining is discovery. A discovery technique takes an event log and produces a process model without using any a-priori information. An example is the Alpha-algorithm that takes an event log and produces a process model (a Petri net) explaining the behavior recorded in the log.
The second type of process mining is conformance. Here, an existing process model is compared with an event log of the same process. Conformance checking can be used to check if reality, as recorded in the log, conforms to the model and vice versa.
The third type of process mining is enhancement. Here, the idea is to extend or improve an existing process model using information about the actual process recorded in some event log. Whereas conformance checking measures the alignment between model and reality, this third type of process mining aims at changing or extending the a-priori model. An example is the extension of a process model with performance information, e.g., showing bottlenecks.
None of these types of analysis can be found in conventional data mining of BI tools.
Can you give examples of how process mining is already used in business and industry? And what are the benefits for companies?
Since the first book on Process Mining in 2011, we have witnessed a rapidly growing interest in process mining. Today, Process Mining is the primary approach to make BPM truly data-driven. The attention for Big Data and the uptake of Data Science strengthen this development. Process Mining is where Data Science and Process Science meet! The growth on Process Mining has been accelerating during 2015 and 2016. Currently, there are about 25 software vendors offering process mining tools. Tools like Celonis Process Mining, Disco (Fluxicon), ProcessGold Enterprise Platform, Minit, myInvenio, Signavio Process Intelligence, QPR ProcessAnalyzer, LANA Process Mining, Rialto Process, Icris Process Mining Factory, Worksoft Analyze & Process Mining for SAP, SNP Business Process Analysis, webMethods Process Performance Manager, and Perceptive Process Mining are now available. The availability and application of these tools illustrate the uptake of Process Mining.
The two primary reasons for applying process mining are: (1) performance and (2) compliance. Process mining is used to discovery the real process thereby uncovering bottlenecks, delays, waste, errors, and inefficiencies. These performance problems are uncovered in an evidence-based manner, taking only a fraction of the time this would normally take. Whenever there is a normative process model (guidelines, regulations, best practices, etc.), process mining can used to diagnose deviations. What is causing these deviations and how harmful are they? For example, within Siemens over 2.000 people are regularly using Celonis process mining.
Your keynote will focus on the application of process mining for the Internet of Events. We all know the Internet of Pages, the Web 1.0, then there was the Internet of People, such as Facebook, and at the PAW Manufacturing conference in February we will focus on the Internet of Things, e.g. sensors, wearables, smart devices etc. What is the Internet of Events and why does process mining play such an important role in the IoE?
I coined the term the “Internet of Events (IoE)” a few years ago. It refers to all event data available and is composed of:
- The Internet of Content (IoC): all information created by humans to increase knowledge on particular subjects. The IoC includes traditional web pages, articles, encyclopedia like Wikipedia, YouTube, e-books, newsfeeds, etc.
- The Internet of People (IoP): all data related to social interaction. The IoP includes e-mail, Facebook, Twitter, forums, LinkedIn, etc.
- The Internet of Things (IoT): all physical objects connected to the network. The IoT includes all things that have a unique id and a presence in an Internet-like structure.
- The Internet of Locations (IoL): refers to all data that have a geographical or geospatial dimension. With the uptake of mobile devices (e.g., smartphones) more and more events have location or movement attributes.
I would say that the Internet of Things (IoT) is indeed the most relevant development for the Predictive Analytics World Manufacturing audience. Process mining is about analysing behaviour and physical objects connected to the network provide the events needed to do this.
So it seems that process mining becomes as popular and common like data mining: corporates discovering new use cases for process mining and explore new fields of application every day. What was the initial problem you tried to solve with process mining, i.e. how, why and when did you invent process mining?
I started to work on process mining in the late nineties. It started with the research project named “Process Design by Discovery: Harvesting Workflow Knowledge from Ad-hoc Executions”. The motivation for the project was that over time I got more and more sceptical about process modeling and workflow automation. Process models often had very little to do with reality. I conducted dozens of simulation projects and each time I noted how difficult it is to get a good model behaving like the real process. At the same time I anticipated the growth of event data long before we had access to it. This is why I tried to leverage my experience in process modeling and workflow technology and combined this with event data. Soon I realized that this would change the whole way we think about process models.
As far as I understand process mining isn’t a brand new technique but a valid method that delivers valuable results. What do you think what will be necessary so that more corporations and data scientists will apply process mining in the future?
As Bill gates once stated: “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten.” For me process mining is already a remarkable success. I never saw such a direct link between academic research and the creation of a new market for analysis tools. Currently there are over 25 process mining tools. I expect that in a few years all the major BPM and BI vendors support process mining. The combination of event data and process models is so natural that it is surprising that it took such a long time for people to see the added value.
And what further innovations do you expect in the field of process mining during the upcoming years – and especially can you reveal what are you working on at the moment?
We are working in all areas of the process mining spectrum: discovery, conformance checking, performance analysis, prediction, etc. For example, we are looking at the scalability of these things while improving the quality of the results. A trend in the current market is that commercial vendors are embracing the topic of conformance checking and more formal models. This is a sign of maturity. Personally, I’m very interested in ways to automatically improve process models based on event data and a topic called “responsible data science” (http://www.responsibledatascience.org/). We need to improve processes, but at the same time we need to protect privacy, ensure fairness, and provide results in a transparent manner. For example, can we construct process models and warn for bottlenecks without actually storing events?