Conformance checking is a technique used to compare a process model with an event log of the same process. The goal is to determine whether the event log conforms to the model, and vice versa. In PM4Py, two fundamental techniques are implemented: token-based replay and alignments.
Token-based replay matches a trace with a Petri net model, starting from the initial place. It discovers which transitions are executed and identifies places with remaining or missing tokens for the given process instance. Token-based replay is particularly useful for conformance checking. A trace is considered to fit the model if, during its execution, transitions can be fired without the need to insert any missing tokens. If reaching the final marking is required, the trace fits if it reaches the final marking without any missing or remaining tokens. For further explanation, refer to the related section.
For each trace, four values must be determined: produced tokens, remaining tokens, missing tokens, and consumed tokens. Based on these values, the following formula can be derived, where a Petri net (n) and a trace (t) are provided as input:
fitness(n, t) = ½(1 − rp) + ½(1 − mc)
To apply this formula to the entire event log, p, r, m, and c are calculated for each trace, summed up, and then inserted into the formula above.
In PM4Py, a token replayer is implemented that can traverse hidden transitions (calculating the shortest paths between places) and works with any Petri net model containing both unique visible transitions and hidden transitions. When a visible transition needs to be fired but not all places in the preset have the correct number of tokens, starting from the current marking, the system checks if a sequence of hidden transitions can be fired to enable the visible transition. The hidden transitions are fired, and a marking that allows the visible transition to be enabled is reached.
The example on the right demonstrates how to apply token-based replay on a log and a Petri net. First, the log is loaded. Then, the Alpha Miner is applied to discover a Petri net. Finally, the token-based replay is applied. The output of the token-based replay, stored in the variable replayed_traces
, contains the following information for each trace in the log:
Executing token-based replay in PM4Py provides detailed information about transitions that did not execute correctly or activities that are present in the log but not in the model. In particular, executions that do not match the model are expected to take longer throughput times.
The diagnostics provided by PM4Py include the following:
To provide execution contexts for the examples, the log must be loaded, and a model that is not perfectly fitting is required. The following instructions can be used to load the log:
To create an unfit model, a filtering operation can be executed that produces a log with only part of the behavior:
Then, apply the Inductive Miner algorithm:
Next, apply the token-based replay with special settings. Specifically, set disable_variants
to True
to avoid replaying only a case with a specific variant, and set enable_pltr_fitness
to True
to instruct the algorithm to return localized conformance checking applications.
Afterward, pass the diagnostic information.
To perform throughput analysis on transitions that were executed incorrectly and print the result to the console, use the following code:
This will show whether unfit executions result in significantly higher throughput times (e.g., from 126 to 146 times higher).
To perform throughput analysis on activities not contained in the model and print the result to the console, use the following code:
The output of root cause analysis in the diagnostic context is a decision tree that helps identify the causes of deviations. For each deviation, a distinct decision tree is built and visualized. In the following examples, the decision trees are constructed using the org:group
attribute, based on the Receipt log.
To perform root cause analysis on transitions that were executed incorrectly, use the following code:
To visualize the decision trees obtained from root cause analysis, use the following code:
To perform root cause analysis on activities that were executed but are not part of the process model, use the following code:
To visualize the decision trees obtained from root cause analysis, use the following code:
PM4Py includes several linear solvers: Scipy (available for all platforms), CVXOPT (available for most widely used platforms, including Windows and Linux), and ORTools, which can also be installed via PIP.
Alignment-based replay aims to find one of the best alignments between the trace and the model. For each trace, the output of an alignment consists of a list of pairs, where the first element is an event (from the trace) or » and the second element is a transition (from the model) or ». For each pair, the following classifications can be made:
First, import the log. Then, apply the Inductive Miner on the imported log and compute the alignments.
To inspect the alignments, use the following code. The output (a list) reports, for each trace, the corresponding alignment along with its statistics. Each trace is associated with a dictionary containing the following information, among others:
To use a different classifier, refer to the Classifier section (#item-3-7). The following code defines a custom classifier for each event of each trace in the log.
A parameters dictionary containing the activity key can be formed.
Then, a process model is computed, and alignments are applied with the custom classifier.
It is also possible to select other parameters for the alignments, such as:
The following code defines custom model and sync cost functions. The model and sync cost functions must be included in the parameters. After that, the replay is performed.
Alignments can be computationally expensive, especially for models with significant concurrency. However, they are the conformance checking technique that provides the best results when it comes to finding a match between process executions and the model. To overcome the challenges associated with the large state space, various methods have been proposed to decompose the model into smaller pieces. This decomposition makes the alignment process easier and still enables the diagnosis of problems.
We've already discussed how to achieve a maximal decomposition of the Petri net model. Now, let's look at how to decompose alignments based on the maximal decomposition of the Petri net model. This approach was published in:
Lee, Wai Lam Jonathan, et al. "Recomposing conformance: Closing the circle on decomposed alignment-based conformance checking in process mining." Information Sciences 466 (2018): 55-91.
Recomposition helps to determine whether each step of the process was executed synchronously or if any deviations occurred. The process starts by performing an alignment on the decomposed Petri nets. Then, the agreement between the activities at the borders is checked. If any disagreements are found, the two components with the disagreement are merged, and the alignment is repeated. When the steps agree across the different alignments of the components, they can be merged into a single alignment. The recomposition order follows the Petri net graph. However, in cases of concurrency, the recomposed alignment may contain valid moves that are not in the correct order.
To perform alignments through decomposition and recomposition, use the following code. A maximum number of border disagreements can be specified for the algorithm. If the number of border disagreements exceeds the provided limit, the alignment is interrupted, and None
is returned for the specific trace's alignment.
Since decomposed models typically have less concurrency, the components are aligned using a Dijkstra approach. However, in the case of border disagreements, this can degrade the algorithm's performance. It's important to note that this is not an approximation technique; according to the authors, it should provide the same fitness as the original alignments. Since the alignment is recomposed, we can use the fitness evaluator to assess the fitness (which is distinct from the computation of fitness described in the paper).
Footprints are a basic but scalable conformance checking technique used to compare entities such as event logs, DFGs, Petri nets, process trees, and other types of models. Essentially, a relationship is inferred between any pair of activities in the log or model. This includes:
A footprints matrix can be calculated to describe the footprint relationship for each pair of activities. This can be computed for different types of models and for the entire event log. It can also be done trace-by-trace if local behavior is important.
Let’s assume that the running-example.xes
event log is loaded:
Apply the Inductive Miner to such a log:
To calculate footprints for the entire log, use the following code:
The footprints of the entire log are:
The data structure is a dictionary with keys like sequence
(expressing directly-follows relationships) and parallel
(expressing parallel behavior that can occur in either direction).
The footprints of the log, trace-by-trace, can be calculated as follows. This will result in a list of footprints for each trace:
The footprints of the Petri net model can be calculated as follows:
The footprints of the model are:
The data structure is a dictionary with keys like sequence
and parallel
. It is possible to visualize a comparison between the footprints of the entire log and those of the model.
To visualize a single footprints table, for example, the one for the model, use the following code:
To compare the two footprints tables, use the following code. Note that the visualization will look the same if no deviations are found. If deviations are discovered, they will be colored red.
To find deviations, repeat the procedure on the receipt.xes
log, applying a heavy filter to the log to discover a simpler model:
Using a conformance checking operation, compare the behavior of the log traces against the footprints of the model:
This will include, for each trace of the log, a set of deviations. Here’s an extract of the list for some traces:
We can see that for the first trace containing deviations, there are two issues: one relates to T06 Determine necessity of stop advice
being executed before T04 Determine confirmation of receipt
, and the other to T02 Check confirmation of receipt
being followed by T06 Determine necessity of stop advice
. The traces for which conformance returns nothing are considered fit (at least according to the footprints).
Footprints conformance checking is a method for identifying obvious deviations—behaviors in the log that are not allowed by the model. On the log side, scalability is excellent! The calculation of footprints for a Petri net model may be more expensive.
If we change the underlying model from Petri nets to a process tree, it’s possible to exploit its bottom-up structure to calculate the footprints almost instantaneously. Let’s open a log, calculate a process tree, and then apply footprint discovery.
Open the running-example
log:
Apply the Inductive Miner to discover a process tree:
Then, discover the footprints. Discover the footprints on the entire log, trace-by-trace in the log, and on the process tree:
Each of these contains:
Execute an enhanced conformance check between the footprints of the entire log and the footprints of the model:
The result contains, for each item in the previous list, the violations. Given the result of conformance checking, calculate the footprints-based fitness and precision of the process model:
These values are both included in the interval [0, 1].
The concept of a log skeleton is introduced in the following paper:
Verbeek, H. M. W., and R. Medeiros de Carvalho. “Log Skeletons: A Classification Approach to Process Discovery.” arXiv preprint arXiv:1806.08247 (2018).
This method is considered the most accurate classification approach for determining whether a trace belongs to (the language of) a log. The log skeleton calculates an object containing a list of relations for a given log.
Inspect Relations:
A noise threshold can also be specified. In this case, more relations are discovered, as the conditions are relaxed.
Suppose we take the running-example.xes
log:
Next, calculate the log skeleton:
The resulting skeleton might look like:
To see how the log skeleton functions for classification and conformance purposes, change to another log (the receipt.xes
log) and calculate a heavily filtered version to reduce the behavior:
Now, calculate the log skeleton for the filtered log and apply the classification as follows:
This process will indicate whether each trace belongs to the filtered log or not. When deviations are found, it means the trace does not belong to the language of the original log.
You can also calculate a log skeleton on the original log by specifying a noise threshold (e.g., 0.03) to see its impact on classification:
Some traces may be classified as incorrect, even when using the original log, if a noise threshold is applied.
In some cases, performing an optimal alignment between an event log and a process model may be impractical. An alternative is to use an approximate alignment that highlights the main points of deviation. PM4Py provides support for alignments between two event logs. This alignment operation is based on the edit distance, where for each trace in the first log, the trace in the second log with the least edit distance is identified.
In the following example, you will see how to perform alignments between an event log and a simulated log obtained by performing a playout operation on the process model. First, load an example log and discover a process model using the inductive miner:
Then, perform a playout operation on the process model:
Next, perform alignments between the two logs:
The result is a list of alignments, each containing a list of moves (sync move, move on log n.1, move on log n.2).
It is also possible to perform anti-alignments. An anti-alignment corresponds to a trace from the second log that has the greatest edit distance from a given trace in the first log. To perform anti-alignments, use the following code:
We propose an implementation of the temporal profile model in PM4Py, as described in the following paper:
Stertz, Florian, Jürgen Mangler, and Stefanie Rinderle-Ma. "Temporal Conformance Checking at Runtime Based on Time-Infused Process Models." arXiv preprint arXiv:2008.07262 (2020).
A temporal profile measures, for every pair of activities in the log, the average time and the standard deviation between events with the specified activities. The time is measured between the completion of the first event and the start of the second event. This requires working with an interval log where events have two timestamps. The output of temporal profile discovery is a dictionary where each pair of activities (represented as a tuple) is associated with two values: the first is the average time, and the second is the average standard deviation.
A temporal profile can also be used for conformance checking on an event log. The time between pairs of activities in the log is compared to the values stored in the temporal profile. Specifically, a value is calculated to show how many standard deviations the time differs from the average. If this value exceeds a threshold (default set to 6, according to the six-sigma principles), the pair of activities is flagged. The result of conformance checking based on the temporal profile is a list containing deviations for each case in the log. Each deviation is represented by a pair of activities, along with the calculated value and the distance (in terms of standard deviations) from the average.
First, load an event log and apply the discovery algorithm:
Then, apply conformance checking based on the temporal profile:
Several parameters can be adjusted to customize the conformance checking process using the temporal profile.
Parameter Key | Type | Default | Description |
---|---|---|---|
Parameters.ACTIVITY_KEY | string | concept:name | The attribute to use as the activity. |
Parameters.START_TIMESTAMP_KEY | string | start_timestamp | The attribute to use as the start timestamp. |
Parameters.TIMESTAMP_KEY | string | time:timestamp | The attribute to use as the timestamp. |
Parameters.ZETA | int | 6 | The multiplier for the standard deviation. Pairs of activities that exceed this threshold will be flagged by the temporal profile. |
LTL (Linear Temporal Logic) Checking is a form of filtering and conformance checking where specific rules are verified against the process executions contained in the log. This allows the checking of more complex patterns, such as:
The verification of LTL rules requires providing the appropriate parameters for the specific rule. As such, this form of conformance checking is not fully automatic.
The LTL rules implemented in PM4Py are listed in the table below:
LTL Rule | Description |
---|---|
ltl.ltl_checker.four_eyes_principle(log, A, B) | Applies the four-eyes principle to activities A and B .Parameters: - log : Event log.- A : The activity A from the rule (an activity in the log).- B : The activity B from the rule (an activity in the log).Returns: A filtered log containing cases where A and B were performed by different people (if applicable). |
ltl.ltl_checker.activity_repeated_by_different_people(log, A) | Checks if activity A was performed multiple times by different people.Parameters: - log : Event log.- A : The activity A from the rule (an activity in the log).Returns: A filtered log where activity A was performed multiple times by different people. |
The rules can be applied to both traditional event logs (XES) and Pandas dataframes by accessing the packages pm4py.algo.filtering.log.ltl
and pm4py.algo.filtering.pandas.ltl
, respectively.