In PM4Py, we offer various simulation algorithms that, starting from a model, generate outputs that adhere to the model and the rules provided by the user.
A playout of a Petri net takes a Petri net and an initial marking as input, returning a list of process executions permitted by the process model. We offer two types of playouts:
Variant | Description |
---|---|
Variants.BASIC_PLAYOUT | A basic playout that accepts a Petri net along with an initial marking and returns a specified number of process executions (with possible repetitions). |
Variants.EXTENSIVE | A playout that accepts a Petri net and an initial marking, returning all possible executions according to the model, up to a specified trace length (this may be computationally expensive). |
The parameters for each variant are as follows:
Variant | Parameter | Description |
---|---|---|
Variants.BASIC_PLAYOUT | Parameters.ACTIVITY_KEY | The name of the attribute to use as activity in the playout log. |
Parameters.TIMESTAMP_KEY | The name of the attribute to use as timestamp in the playout log. | |
Parameters.CASE_ID_KEY | The trace attribute to be used as the case identifier in the playout log. | |
Parameters.NO_TRACES | The number of traces to be included in the playout log. | |
Parameters.MAX_TRACE_LENGTH | The maximum trace length, after which the playout of the trace stops. | |
Variants.EXTENSIVE | Parameters.ACTIVITY_KEY | The name of the attribute to use as activity in the playout log. |
Parameters.TIMESTAMP_KEY | The name of the attribute to use as timestamp in the playout log. | |
Parameters.CASE_ID_KEY | The trace attribute to be used as case identifier in the playout log. | |
Parameters.MAX_TRACE_LENGTH | The maximum trace length, after which the extensive playout stops. |
Here's an example of applying the basic playout, given a Petri net, to generate a log with 50 traces:
Here's an example of applying the extensive playout, given a Petri net, to generate a log with all executions of length ≤ 7:
A time-related simulation allows you to determine the probability of process execution completion within a given time frame. This leads to a better estimation of Service Level Agreements or a better identification of process instances likely to have long throughput times.
This process begins with a performance DFG (e.g., the one discovered from the running example log):
Additionally, the case arrival ratio must be known. This refers to the average (or median) time between the arrival of two consecutive cases. It can be provided by the user or inferred from the event log using the following command:
Using the DFG mining approach, you can extract a Petri net model. This model is the “default” choice for Monte Carlo simulation, as its execution semantics are clear. Moreover, the Petri net generated by DFG mining is a sound workflow net, which ensures good properties for the model.
The DFG mining approach can be applied as follows:
To perform a basic Monte Carlo simulation, the following code can be used. This example assumes a resource-constrained simulation, where a place can hold at most 1 token at a time. Later, we’ll show how to allow more tokens per place.
During the replay operation, debug messages are displayed. The main outputs of the simulation process are:
Key | Description |
---|---|
simulated_log | The traces simulated during the process. |
res | The result of the simulation (Python dictionary). |
The result dictionary, `res`, includes the following keys:
Key | Description |
---|---|
places_interval_trees | An interval tree for each place, containing intervals when it was "full" according to the specified maximum tokens per place. |
transitions_interval_trees | An interval tree for each transition, showing time intervals when the transition was enabled but not yet fired. |
cases_ex_time | A list of throughput times for all cases in the log. |
median_cases_ex_time | The median throughput time for cases in the simulated log. |
input_case_arrival_ratio | The case arrival ratio, either provided by the user or inferred from the event log. |
total_cases_time | The time difference between the last and first timestamps of the simulated log. |
The last four items in the list are basic Python objects (e.g., floats and lists). The interval tree objects can be queried to obtain time-specific information. For example, the following code snippet prints, for a random transition, the number of overlapping intervals at 11 different time points (including the minimum and maximum timestamps):
Similarly, the following code snippet prints, for a random place, the number of overlapping intervals at 11 different time points:
This information can be used to build graphs using tools like Microsoft Excel.
The simulation process can be summarized as follows:
Several parameters are essential for performing a Monte Carlo simulation. These parameters, found in the petri_semaph_fifo
class, are listed below in order of importance:
Variant | Parameter | Description |
---|---|---|
Variants.PETRI_SEMAPH_FIFO | Parameters.PARAM_NUM_SIMULATIONS | Number of simulations to run (the goal is to generate this number of traces in the model). |
Parameters.PARAM_CASE_ARRIVAL_RATIO | The case arrival ratio specified by the user. | |
Parameters.PARAM_MAP_RESOURCES_PER_PLACE | A map specifying the maximum number of tokens for each place in the Petri net. | |
Parameters.PARAM_DEFAULT_NUM_RESOURCES_PER_PLACE | If the map is not provided, this default number of resources per place is used. | |
Parameters.PARAM_MAX_THREAD_EXECUTION_TIME | The maximum execution time for the simulation (e.g., 60 seconds). | |
Parameters.PARAM_SMALL_SCALE_FACTOR | The ratio between real time and simulation time. A higher ratio speeds up the simulation but sacrifices accuracy, while a lower ratio offers more detail. The default is 864,000 seconds (10 days). | |
Parameters.PARAM_ENABLE_DIAGNOSTICS | Enables diagnostic printing through Python’s “logging” class. | |
Parameters.ACTIVITY_KEY | The attribute of the log to use as activity. | |
Parameters.TIMESTAMP_KEY | The attribute of the log to use as timestamp. | |
Parameters.TOKEN_REPLAY_VARIANT | The variant of token-based replay to use: token_replay (classic variant) or backwards (slower but handles invisible transitions). | |
Parameters.PARAM_FORCE_DISTRIBUTION | If specified, forces a specific distribution for the transitions (e.g., normal or exponential). | |
Parameters.PARAM_DIAGN_INTERVAL | The time interval at which diagnostics are printed (e.g., every 10 seconds). |
An extensive playout operation allows the generation (up to the specified limits) of the entire language of the process model. Performing an extensive playout on a Petri net can be computationally expensive, as it requires exploring the reachability graph. Process trees, with their bottom-up structure, make it much easier to generate the entire language of an event log. The process starts from the language of the leaves (which is straightforward) and follows specific merge rules for the operators.
However, since the language of a process tree can be vast (especially when parallel operators are involved) or even infinite (when loops are present), extensive playouts are possible only within certain limits:
Additionally, the structure of the process tree makes it easy to infer the minimum trace length allowed by the process model (following the bottom-up approach).
Some reasonable default settings for the extensive playout are as follows:
The list of parameters for the extensive playout is as follows:
Parameter | Description |
---|---|
MAX_LIMIT_NUM_TRACES | Maximum number of traces returned by the algorithm. |
MAX_TRACE_LENGTH | Maximum length of a trace output by the algorithm. |
MAX_LOOP_OCC | Maximum number of times a loop can be entered. |
Below is an example of how the playout can be executed. First, a log can be imported:
Next, a process tree can be discovered using the inductive miner algorithm:
In this example, we specify retrieving traces of length at most 3, and we set the maximum number of traces to 100,000.
At this point, the extensive playout operation is complete.