Simulation

In PM4Py, we offer various simulation algorithms that, starting from a model, generate outputs that adhere to the model and the rules provided by the user.

Playout of a Petri Net

A playout of a Petri net takes a Petri net and an initial marking as input, returning a list of process executions permitted by the process model. We offer two types of playouts:

VariantDescription
Variants.BASIC_PLAYOUTA basic playout that accepts a Petri net along with an initial marking and returns a specified number of process executions (with possible repetitions).
Variants.EXTENSIVEA playout that accepts a Petri net and an initial marking, returning all possible executions according to the model, up to a specified trace length (this may be computationally expensive).

The parameters for each variant are as follows:

VariantParameterDescription
Variants.BASIC_PLAYOUTParameters.ACTIVITY_KEYThe name of the attribute to use as activity in the playout log.
Parameters.TIMESTAMP_KEYThe name of the attribute to use as timestamp in the playout log.
Parameters.CASE_ID_KEYThe trace attribute to be used as the case identifier in the playout log.
Parameters.NO_TRACESThe number of traces to be included in the playout log.
Parameters.MAX_TRACE_LENGTHThe maximum trace length, after which the playout of the trace stops.
Variants.EXTENSIVEParameters.ACTIVITY_KEYThe name of the attribute to use as activity in the playout log.
Parameters.TIMESTAMP_KEYThe name of the attribute to use as timestamp in the playout log.
Parameters.CASE_ID_KEYThe trace attribute to be used as case identifier in the playout log.
Parameters.MAX_TRACE_LENGTHThe maximum trace length, after which the extensive playout stops.

Here's an example of applying the basic playout, given a Petri net, to generate a log with 50 traces:

                
            

Here's an example of applying the extensive playout, given a Petri net, to generate a log with all executions of length ≤ 7:

                
            

Monte Carlo Simulation

A time-related simulation allows you to determine the probability of process execution completion within a given time frame. This leads to a better estimation of Service Level Agreements or a better identification of process instances likely to have long throughput times.

This process begins with a performance DFG (e.g., the one discovered from the running example log):

                
            

Additionally, the case arrival ratio must be known. This refers to the average (or median) time between the arrival of two consecutive cases. It can be provided by the user or inferred from the event log using the following command:

                
            

Using the DFG mining approach, you can extract a Petri net model. This model is the “default” choice for Monte Carlo simulation, as its execution semantics are clear. Moreover, the Petri net generated by DFG mining is a sound workflow net, which ensures good properties for the model.

The DFG mining approach can be applied as follows:

                
            

To perform a basic Monte Carlo simulation, the following code can be used. This example assumes a resource-constrained simulation, where a place can hold at most 1 token at a time. Later, we’ll show how to allow more tokens per place.

                
            

The main outputs of the simulation process are:

KeyDescription
simulated_logThe traces simulated during the process.
resThe result of the simulation (Python dictionary).

The result dictionary, `res`, includes the following keys:

KeyDescription
output_places_interval_treesAn interval tree for each place, containing intervals when it was "full" according to the specified maximum tokens per place.
output_transitions_interval_treesAn interval tree for each transition (keyed by transition name), showing time intervals when the transition was enabled but not yet fired.
cases_ex_timeA list of throughput times for all cases in the log.
median_cases_ex_timeThe median throughput time for cases in the simulated log.
case_arrival_ratioThe case arrival ratio used in simulation; if not provided, it is inferred from the event log.
total_cases_timeThe time difference between the last and first timestamps of the simulated log.

The last four items in the list are basic Python objects (e.g., floats and lists). The interval tree objects can be queried to obtain time-specific information. For example, the following code snippet prints, for a random transition, the number of overlapping intervals at 11 different time points (including the minimum and maximum timestamps):

                
            

Similarly, the following code snippet prints, for a random place, the number of overlapping intervals at 11 different time points:

                
            

This information can be used to build graphs using tools like Microsoft Excel.

The simulation process can be summarized as follows:

  • An event log and a model (DFG) are used as inputs.
  • The simulation performs a replay operation between the log and the model.
  • This replay operation constructs a stochastic map associating each transition with a probability distribution, selected to maximize the likelihood of the observed values. Users can specify a distribution (e.g., exponential) if needed.
  • During replay, the frequency of each transition is calculated, allowing the weighted selection of enabled transitions.
  • The simulation runs via generator-based case simulators interleaved by a virtual-time scheduler using a priority queue (no real-time sleeping or OS threads).
  • Cases are started according to the case arrival ratio; at each step, the next case to advance is the one with the earliest next simulated time.

Several parameters are essential for performing a Monte Carlo simulation. These parameters, found in the petri_semaph_fifo class, are listed below in order of importance:

VariantParameterDescription
Variants.PETRI_SEMAPH_FIFOParameters.PARAM_NUM_SIMULATIONSNumber of simulations to run (the goal is to generate this number of traces in the model).
Parameters.PARAM_CASE_ARRIVAL_RATIOThe case arrival ratio; if not provided, it is inferred from the log.
Parameters.PARAM_MAP_RESOURCES_PER_PLACEA map specifying the maximum number of tokens for each place in the Petri net.
Parameters.PARAM_DEFAULT_NUM_RESOURCES_PER_PLACEIf the map is not provided, this default number of resources per place is used.
Parameters.PARAM_SMALL_SCALE_FACTORCompatibility parameter; not used in the current generator-based implementation.
Parameters.ACTIVITY_KEYThe attribute of the log to use as activity.
Parameters.TIMESTAMP_KEYThe attribute of the log to use as timestamp.
Parameters.PARAM_FORCE_DISTRIBUTIONIf specified, forces a specific distribution for the transitions (e.g., normal or exponential).
Parameters.PARAM_PROVIDED_SMAPOptional precomputed stochastic map for transitions; if not provided, it is estimated from the log and model.
Parameters.PARAM_DIAGN_INTERVALCompatibility parameter; not used in the current generator-based implementation.
Parameters.PARAM_ENABLE_DIAGNOSTICSCompatibility parameter; not used in the current generator-based implementation.

Extensive Playout of a Process Tree

An extensive playout operation allows the generation (up to the specified limits) of the entire language of the process model. Performing an extensive playout on a Petri net can be computationally expensive, as it requires exploring the reachability graph. Process trees, with their bottom-up structure, make it much easier to generate the entire language of an event log. The process starts from the language of the leaves (which is straightforward) and follows specific merge rules for the operators.

However, since the language of a process tree can be vast (especially when parallel operators are involved) or even infinite (when loops are present), extensive playouts are possible only within certain limits:

  • If a loop is present, the maximum number of occurrences for that loop must be specified. This will stop the extensive playout operation after the given number of occurrences.
  • Since the number of possible executions involving loops can still be very large, the maximum length of a trace can be specified. Traces that exceed this length are automatically discarded.
  • To further limit the number of executions, the maximum number of traces returned by the algorithm can also be specified.

Additionally, the structure of the process tree makes it easy to infer the minimum trace length allowed by the process model (following the bottom-up approach).

Some reasonable default settings for the extensive playout are as follows:

  • The maximum number of traces returned by the algorithm is set to 100,000.
  • The maximum length of a trace output by the playout is, by default, set to the minimum trace length accepted by the process tree.
  • The maximum number of loop occurrences is set to half the minimum trace length.

The list of parameters for the extensive playout is as follows:

ParameterDescription
MAX_LIMIT_NUM_TRACESMaximum number of traces returned by the algorithm.
MAX_TRACE_LENGTHMaximum length of a trace output by the algorithm.
MAX_LOOP_OCCMaximum number of times a loop can be entered.

Below is an example of how the playout can be executed. First, a log can be imported:

    

Next, a process tree can be discovered using the inductive miner algorithm:

    

In this example, we specify retrieving traces of length at most 3, and we set the maximum number of traces to 100,000.

    

At this point, the extensive playout operation is complete.