Log-Model Evaluation

In PM4Py, it is possible to compare the behavior contained in the log with the behavior in the model to see if and how they match. Four different dimensions exist in process mining: replay fitness, precision, generalization, and simplicity.

Replay Fitness

The calculation of replay fitness aims to determine how much of the behavior in the log is admitted by the process model. We propose two methods to calculate replay fitness: token-based replay and alignments.

For token-based replay, the percentage of completely fitting traces is returned, along with a fitness value calculated as described in the following scientific contribution: Berti, Alessandro, and Wil MP van der Aalst. "Reviving Token-based Replay: Increasing Speed While Improving Diagnostics." ATAED@Petri Nets/ACSD. 2019.

For alignments, the percentage of completely fitting traces is returned, along with a fitness value calculated as the average of the fitness values of the individual traces. The two variants of replay fitness are implemented as Variants.TOKEN_BASED and Variants.ALIGNMENT_BASED respectively.

To calculate the replay fitness between an event log and a Petri net model using the token-based replay method, use the code on the right. The resulting value will be a number between 0 and 1.

                
            

To calculate the replay fitness between an event log and a Petri net model using the alignments method, use the code on the right. The resulting value will be a number between 0 and 1.

                
            

Precision

We propose two approaches for measuring precision in PM4Py:

  • ETConformance (using token-based replay): The reference paper is
    Muñoz-Gama, Jorge, and Josep Carmona. "A Fresh Look at Precision in Process Conformance." International Conference on Business Process Management. Springer, Berlin, Heidelberg, 2010.
  • Align-ETConformance (using alignments): The reference paper is
    Adriansyah, Arya, et al. "Measuring Precision of Modeled Behavior." Information Systems and e-Business Management 13.1 (2015): 37-67.

The underlying concept of both approaches is the same: different prefixes of the log are replayed (when possible) on the model. At the reached marking, the set of transitions enabled in the process model is compared with the set of activities that follow the prefix. The more the sets differ, the lower the precision value; the more the sets align, the higher the precision value. This process only works if the prefix replay on the process model succeeds. If the replay does not produce a result, the prefix is not considered for precision computation. Therefore, precision calculated on unfit processes is not meaningful.

The main difference between the approaches is the replay method. Token-based replay is faster but relies on heuristics (so the replay result might not be exact). Alignments are precise, work on any kind of relaxed sound nets, but can be slow if the state space is large.

The two variants, ETConformance and Align-ETConformance, are available as Variants.ETCONFORMANCE_TOKEN and Variants.ALIGN_ETCONFORMANCE in the implementation, respectively.

To calculate the precision between an event log and a Petri net model using the ETConformance method, use the code on the right. The resulting value will be a number between 0 and 1.

                
            

To calculate the precision between an event log and a Petri net model using the Align-ETConformance method, use the code on the right. The resulting value will be a number between 0 and 1.

                
            

Generalization

Generalization is the third dimension to analyze how the log and the process model match. In particular, we propose the generalization measure described in the following research paper: Buijs, Joos CAM, Boudewijn F. van Dongen, and Wil MP van der Aalst. "Quality Dimensions in Process Discovery: The Importance of Fitness, Precision, Generalization, and Simplicity." International Journal of Cooperative Information Systems 23.01 (2014): 1440001.

In general, a model is considered general if its elements are visited frequently enough during a replay operation (of the log on the model). A model may perfectly fit the log and be perfectly precise (for example, by reporting the traces of the log as sequential models going from the initial marking to the final marking; a choice is made at the initial marking). Therefore, to measure generalization, a token-based replay operation is performed, and generalization is calculated as

where \(avgt\) is the average of the inner value over all the transitions, and freq(t) is the frequency of transition \(t\) after the replay.

To calculate the generalization between an event log and a Petri net model using the method described in this section, use the code on the right. The resulting value will be a number between 0 and 1.

                
            

Simplicity

Simplicity is the fourth dimension to analyze in a process model. In this case, we define simplicity by considering only the Petri net model. The criterion we use for simplicity is the inverse arc degree, as described in the following research paper: Blum, Fabian Rojas. "Metrics in Process Discovery." Technical Report TR/DCC-2015-6, Computer Science Department, University of Chile, 2015.

First, we consider the average degree for a place/transition of the Petri net, defined as the sum of the number of input arcs and output arcs. If all places have at least one input arc and one output arc, the minimum degree is 2. Choosing a number \(k\) between 0 and infinity, simplicity based on the inverse arc degree is defined as

To calculate the simplicity of a Petri net model using the inverse arc degree, use the following code. The resulting value will be a number between 0 and 1.

                
            

Earth Mover Distance

The Earth Mover Distance, introduced by Leemans, Sander JJ, Anja F. Syring, and Wil MP van der Aalst. “Earth Movers’ Stochastic Conformance Checking.” International Conference on Business Process Management. Springer, Cham, 2019, provides a way to calculate the distance between two different stochastic languages. Generally, one language is extracted from the event log, and one language is extracted from the process model. By language, we mean a set of traces weighted according to their probability.

For the event log, the set of variants is divided by the total number of traces to provide the language of the log. We can also discover the language of the model by importing an event log and calculating its language:

                
            

The same procedure does not naturally apply to the process model. To calculate a language for the process model, a scalable approach (but non-deterministic) is to perform a playout of the model to obtain an event log. Let’s first apply the Alpha Miner, and then perform a playout of the Petri net using the STOCHASTIC_PLAYOUT variant.

                
            

We can then calculate the language of the event log:

                
            

Next, we obtain the language of the model and calculate the Earth Mover Distance:

  • Ensure that both languages contain the same traces. If a language does not contain a trace, it is assigned a value of 0.
  • Decide on a common ordering (e.g., alphabetical) for the keys of the languages.
  • Calculate the distance between different keys using a string distance function such as the Levenshtein function.

This results in a number greater than or equal to 0 that expresses the distance between the language of the log and the language of the model. This serves as an alternative measure for precision. To calculate the Earth Mover Distance, the Python package pyemd should be installed (pip install pyemd).

The code to apply the Earth Mover Distance is as follows:

                
            

When using the running-example log and the Alpha Miner model, a value similar to or equal to 0.1733 will be obtained.

WOFLAN

WOFLAN is a widely used approach for soundness checking on workflow nets. It provides meaningful statistics to the end user. WOFLAN is described in detail in the following paper: Woflan 2.0: A Petri-Net-Based Workflow Diagnosis Tool. Additionally, definitions of workflow nets and soundness can be found at: Petri Net Wikipedia.

WOFLAN is applied to an accepting Petri net (a Petri net with initial and final markings) and follows these steps (the detailed meanings of each step are explained in the thesis):

  • Check if the Petri net and the markings are valid.
  • Verify if the Petri net is a workflow net.
  • Check if all places are covered by S-components.
  • Ensure that all pairs are well-handled.
  • Verify that places are covered in uniform invariants.
  • Check if places are covered in weighted invariants.
  • Ensure that the WPD is proper.
  • Check for substates in the MCG.
  • Look for unbounded sequences.
  • Check for dead tasks.
  • Verify if tasks are live.
  • Check for non-live tasks.
  • Examine sequences that may lead to deadlocks.

The order of these checks is illustrated in the diagram available at this link: WOFLAN Steps. If a step passes, "Yes" will appear on the corresponding edge; if a step fails, "No" will appear instead.

Let's see how WOFLAN can be applied. First, we need to load an XES log.

                
            

Next, we will discover a model using the Heuristics Miner.

                
            

Afterward, soundness can be checked by executing the following:

                
            

In this case, is_sound will contain a boolean value: True if the Petri net is sound, and False otherwise. The list of parameters includes:

ParameterDescription
PRINT_DIAGNOSTICSEnables printing of diagnostics when WOFLAN is executed.
RETURN_DIAGNOSTICSReturns a dictionary containing the diagnostics.
RETURN_ASAP_WHEN_NOT_SOUNDStops the execution of WOFLAN when a condition is found indicating the Petri net is not sound.

For the provided Petri net (which is not sound), the output of the technique is False. To determine the reason for the net's lack of soundness, rerun the script with PRINT_DIAGNOSTICS set to True and RETURN_ASAP_WHEN_NOT_SOUND set to False for more detailed diagnostics. Here’s the output you would see during execution:

                
            

Based on this output, we can conclude that:

  • There are places not covered by an S-component.
  • No well-handled pairs exist.
  • Some places are uncovered in uniform and weighted invariants.
  • The WPD is improper.
  • There are unbounded sequences present.

To obtain the diagnostics in a dictionary, execute the script with the following code:

                
            

The dictionary dictio_diagnostics may contain the following keys (if the corresponding step is reached during computation):

KeyDescription
S_C_NETDetails about S-components.
PLACE_INVARIANTSInformation about place invariants.
UNIFORM_PLACE_INVARIANTSDetails about uniform place invariants.
S_COMPONENTSInformation about S-components.
UNCOVERED_PLACES_S_COMPONENTDetails about uncovered places in S-components.
NOT_WELL_HANDLED_PAIRSInformation about not well-handled pairs.
LEFTDetails regarding remaining tasks.
UNCOVERED_PLACES_UNIFORMDetails about uncovered places in uniform invariants.
WEIGHTED_PLACE_INVARIANTSDetails about weighted place invariants.
UNCOVERED_PLACES_WEIGHTEDDetails about uncovered places in weighted invariants.
MCGInformation about the MCG (Marking Condition Graph).
DEAD_TASKSInformation about dead tasks.
R_G_S_CInformation about restricted guard scenario conditions.
R_GDetails regarding restricted guards.
LOCKING_SCENARIOSDetails about locking scenarios.
RESTRICTED_COVERABILITY_TREEDetails about the restricted coverability tree.