Last week, Dirk Fahland posted an interesting article on his blog about the generalization quality dimension in process mining (/process discovery). Since this is one of the topics I touched during my PhD research I just have to reply because I have a slightly different view, and I have the feeling that two concepts are mixed in the discussion. Unfortunately, this could not be done in a comment to the original post…
Dirk discusses the problem of the confidence one can have in a discovered process model, given an event log. A very related question is “have we seen enough traces”? These are all valid questions that we currently can not confidently answer (i.e. it is ongoing research).
Before I can explain why my view slightly differs, let me first explain our view on the quality dimensions in process discovery.
In his blog post Dirk refers to an article of Wil van der Aalst, Boudewijn van Dongen and myself that shows that there is indeed a need for (at least) the four quality dimensions that we currently use in process discovery:
Since that article we have prepared an extension for a journal , that unfortunately takes a while to appear. However, I think this is a good moment to share our view on the quality dimensions.
In that publication we view the relation of the unknown system (that allows for certain behaviour), the event log (containing observed behaviour), and the discovered process model (that is a description of the behaviour). When the behaviour allowed by these three entities are viewed as a Venn diagram, you get the following picture:
This Venn diagram contains 7 interesting areas (ignoring the 8th white area):
- Behaviour that is allowed by the system, observed in the event log and described by the process model;
- Behaviour that is observed although not allowed by the system and process model;
- Behaviour that is observed and modelled although not allowed by the system;
- Behaviour that is modelled but not observed or allowed;
- Behaviour that is modelled and allowed but not observed;
- Behaviour that is allowed by the system but which is not observed and modelled;
- Behaviour that is allowed and observed but not modelled.
Using this view on the event log, the system and the process model, we can also specify recall and precision (see Wikipedia).
Replay fitness is actually the recall between the model and the event log.
Precision is the precision between the model and the event log.
Noise (or exceptions) is the (inverse of) the precision between the event log and the system (how much additional behaviour does the event log contain that is not in the system’s allowed behaviour).
Completeness of an event log is the recall between the event log and the system (e.g. how much of the system behaviour is captured in the event log).
Generalization is then related to precision and recall between the process model and the system behaviour. In our paper we argue that it actually is the recall aspect between the process model and the system.
I’m skipping over a lot of details but I hope the general idea is clear: there are (up to) 6 fractions/metrics that express the relation between the behaviour of the event log, the system and the process model.
So, where do I disagree?
- Dirk explains that the measure of generalization expresses how certain one is that the process model discovered from the event log actually describes the system behaviour. I do not agree however with the statement that generalization contradicts with the quality dimensions of replay fitness, precision and simplicity. In some cases one can achieve (almost) perfect scores for all four quality dimensions. I do agree that often there is a trade-off, but certainly not always a contradiction.
- Later on Dirk mentions that “There is currently no generic mathematical definition to compute for a given log K that it was general enough (contains enough information to infer entire L from K).“.
Here two terms are starting to get mixed up. Generalization is a metric between the process model and the system. However, Dirk says an event log can be general (i.e. contains information to infer a generic process model from this event log, that describes the system). The proper term for this is completeness: is enough system behaviour captured in the event log?
Next Dirk writes “This usually depends on the algorithm, the kind of original system S/the language L, and the kind of model one would like to discover.”. Dirk is correct in the sense that different algorithms make different assumptions regarding the event log. Most algorithms assume the event log is complete (and sometimes use this as an argument to ignore generalization). Of course it also depends on the (type of) system. However, I’m not sure what Dirk means by the type of model one would like to discover. I’m assuming he means which trade-offs between the different quality dimensions the user is willing to make.
- Next Dirk writes “The most general result that I am aware of is that the log K has to be directly follows-complete.“. I currently disagree because the directly-follows relation is an abstraction on the behaviour as recorded in the event log. And again, Dirk assumes that the event log K is complete, hence generalization can be ‘ignored’ according to his argumentation.
Although the algorithm Dirk developed together with Sander Leemans, the Inductive Tree Miner, so far provides promising results, I strongly believe this is because the assumption is made that the event log is reasonably directly-follows complete.
- At the end of his blog post Dirk concludes with stating that determining whether an event log is (directly-follows) complete is so far an unsolved problem. With this I agree, but I disagree that this is required for generalization.
In  we propose a metric for generalization that takes into account the number of times a node in the discovered process model is used to explain the observed behaviour (i.e. used in an alignment). This metric captures two things: first of all if the event log contains more observations, generalization increases. Secondly, generalization is reduced if activities are included multiple times in a process model: an indication that too specific behaviour is being modelled.
We have also created an evolutionary process discovery algorithm (the ETM algorithm) that is able to seamlessly balance the four quality dimensions of replay fitness, precision, generalization and simplicity. An extension of the ETM algorithm is able to produce a collection of process models, given an event log, that both have different trade-offs between these four quality dimensions  to let the user decide which model is ‘best’.
To conclude: I partly disagree with Dirk and strongly argue that generalization should not be confused and/or mixed with event log-completeness.I do however agree with Dirk that for both completeness and generalization no established metrics exist.
I’m open for discussion!
 W.M.P. van der Aalst. Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer, 2011.
 J.C.A.M. Buijs, B.F. van Dongen and W.M.P. van der Aalst. Quality Dimensions in Process Discovery: The Importance of Fitness, Precision, Generalization and Simplicity. International Journal of Cooperative Information Systems (to appear).
 J.C.A.M. Buijs, B.F. van Dongen and W.M.P. van der Aalst. Discovering and Navigating a Collection of Process Models using Multiple Quality Dimensions. (to appear in Workshop proceedings of BPM 2013)
This article was first published on Blog of Joos Buijs: A reply on “Some thoughts on Generalization”.