Simulation

In PM4Py, we offer various simulation algorithms that, starting from a model, generate outputs that adhere to the model and the rules provided by the user.

Playout of a Petri Net

A playout of a Petri net takes a Petri net and an initial marking as input, returning a list of process executions permitted by the process model. We offer two types of playouts:

VariantDescription
Variants.BASIC_PLAYOUTA basic playout that accepts a Petri net along with an initial marking and returns a specified number of process executions (with possible repetitions).
Variants.EXTENSIVEA playout that accepts a Petri net and an initial marking, returning all possible executions according to the model, up to a specified trace length (this may be computationally expensive).

The parameters for each variant are as follows:

VariantParameterDescription
Variants.BASIC_PLAYOUTParameters.ACTIVITY_KEYThe name of the attribute to use as activity in the playout log.
Parameters.TIMESTAMP_KEYThe name of the attribute to use as timestamp in the playout log.
Parameters.CASE_ID_KEYThe trace attribute to be used as the case identifier in the playout log.
Parameters.NO_TRACESThe number of traces to be included in the playout log.
Parameters.MAX_TRACE_LENGTHThe maximum trace length, after which the playout of the trace stops.
Variants.EXTENSIVEParameters.ACTIVITY_KEYThe name of the attribute to use as activity in the playout log.
Parameters.TIMESTAMP_KEYThe name of the attribute to use as timestamp in the playout log.
Parameters.CASE_ID_KEYThe trace attribute to be used as case identifier in the playout log.
Parameters.MAX_TRACE_LENGTHThe maximum trace length, after which the extensive playout stops.

Here's an example of applying the basic playout, given a Petri net, to generate a log with 50 traces:

                
            

Here's an example of applying the extensive playout, given a Petri net, to generate a log with all executions of length ≤ 7:

                
            

Monte Carlo Simulation

A time-related simulation allows you to determine the probability of process execution completion within a given time frame. This leads to a better estimation of Service Level Agreements or a better identification of process instances likely to have long throughput times.

This process begins with a performance DFG (e.g., the one discovered from the running example log):

                
            

Additionally, the case arrival ratio must be known. This refers to the average (or median) time between the arrival of two consecutive cases. It can be provided by the user or inferred from the event log using the following command:

                
            

Using the DFG mining approach, you can extract a Petri net model. This model is the “default” choice for Monte Carlo simulation, as its execution semantics are clear. Moreover, the Petri net generated by DFG mining is a sound workflow net, which ensures good properties for the model.

The DFG mining approach can be applied as follows:

                
            

To perform a basic Monte Carlo simulation, the following code can be used. This example assumes a resource-constrained simulation, where a place can hold at most 1 token at a time. Later, we’ll show how to allow more tokens per place.

                
            

During the replay operation, debug messages are displayed. The main outputs of the simulation process are:

KeyDescription
simulated_logThe traces simulated during the process.
resThe result of the simulation (Python dictionary).

The result dictionary, `res`, includes the following keys:

KeyDescription
places_interval_treesAn interval tree for each place, containing intervals when it was "full" according to the specified maximum tokens per place.
transitions_interval_treesAn interval tree for each transition, showing time intervals when the transition was enabled but not yet fired.
cases_ex_timeA list of throughput times for all cases in the log.
median_cases_ex_timeThe median throughput time for cases in the simulated log.
input_case_arrival_ratioThe case arrival ratio, either provided by the user or inferred from the event log.
total_cases_timeThe time difference between the last and first timestamps of the simulated log.

The last four items in the list are basic Python objects (e.g., floats and lists). The interval tree objects can be queried to obtain time-specific information. For example, the following code snippet prints, for a random transition, the number of overlapping intervals at 11 different time points (including the minimum and maximum timestamps):

                
            

Similarly, the following code snippet prints, for a random place, the number of overlapping intervals at 11 different time points:

                
            

This information can be used to build graphs using tools like Microsoft Excel.

The simulation process can be summarized as follows:

  • An event log and a model (DFG) are used as inputs.
  • The simulation performs a replay operation between the log and the model.
  • This replay operation constructs a stochastic map associating each transition with a probability distribution, selected to maximize the likelihood of the observed values. Users can specify a distribution (e.g., exponential) if needed.
  • During replay, the frequency of each transition is calculated, allowing the weighted selection of enabled transitions.
  • The simulation runs by spawning threads for each trace, with stochastic choices made based on the model's constraints.
  • The maximum simulation time is specified. If threads exceed this time, they are terminated, and their traces are excluded from the log.

Several parameters are essential for performing a Monte Carlo simulation. These parameters, found in the petri_semaph_fifo class, are listed below in order of importance:

VariantParameterDescription
Variants.PETRI_SEMAPH_FIFOParameters.PARAM_NUM_SIMULATIONSNumber of simulations to run (the goal is to generate this number of traces in the model).
Parameters.PARAM_CASE_ARRIVAL_RATIOThe case arrival ratio specified by the user.
Parameters.PARAM_MAP_RESOURCES_PER_PLACEA map specifying the maximum number of tokens for each place in the Petri net.
Parameters.PARAM_DEFAULT_NUM_RESOURCES_PER_PLACEIf the map is not provided, this default number of resources per place is used.
Parameters.PARAM_MAX_THREAD_EXECUTION_TIMEThe maximum execution time for the simulation (e.g., 60 seconds).
Parameters.PARAM_SMALL_SCALE_FACTORThe ratio between real time and simulation time. A higher ratio speeds up the simulation but sacrifices accuracy, while a lower ratio offers more detail. The default is 864,000 seconds (10 days).
Parameters.PARAM_ENABLE_DIAGNOSTICSEnables diagnostic printing through Python’s “logging” class.
Parameters.ACTIVITY_KEYThe attribute of the log to use as activity.
Parameters.TIMESTAMP_KEYThe attribute of the log to use as timestamp.
Parameters.TOKEN_REPLAY_VARIANTThe variant of token-based replay to use: token_replay (classic variant) or backwards (slower but handles invisible transitions).
Parameters.PARAM_FORCE_DISTRIBUTIONIf specified, forces a specific distribution for the transitions (e.g., normal or exponential).
Parameters.PARAM_DIAGN_INTERVALThe time interval at which diagnostics are printed (e.g., every 10 seconds).

Extensive Playout of a Process Tree

An extensive playout operation allows the generation (up to the specified limits) of the entire language of the process model. Performing an extensive playout on a Petri net can be computationally expensive, as it requires exploring the reachability graph. Process trees, with their bottom-up structure, make it much easier to generate the entire language of an event log. The process starts from the language of the leaves (which is straightforward) and follows specific merge rules for the operators.

However, since the language of a process tree can be vast (especially when parallel operators are involved) or even infinite (when loops are present), extensive playouts are possible only within certain limits:

  • If a loop is present, the maximum number of occurrences for that loop must be specified. This will stop the extensive playout operation after the given number of occurrences.
  • Since the number of possible executions involving loops can still be very large, the maximum length of a trace can be specified. Traces that exceed this length are automatically discarded.
  • To further limit the number of executions, the maximum number of traces returned by the algorithm can also be specified.

Additionally, the structure of the process tree makes it easy to infer the minimum trace length allowed by the process model (following the bottom-up approach).

Some reasonable default settings for the extensive playout are as follows:

  • The maximum number of traces returned by the algorithm is set to 100,000.
  • The maximum length of a trace output by the playout is, by default, set to the minimum trace length accepted by the process tree.
  • The maximum number of loop occurrences is set to half the minimum trace length.

The list of parameters for the extensive playout is as follows:

ParameterDescription
MAX_LIMIT_NUM_TRACESMaximum number of traces returned by the algorithm.
MAX_TRACE_LENGTHMaximum length of a trace output by the algorithm.
MAX_LOOP_OCCMaximum number of times a loop can be entered.

Below is an example of how the playout can be executed. First, a log can be imported:

    

Next, a process tree can be discovered using the inductive miner algorithm:

    

In this example, we specify retrieving traces of length at most 3, and we set the maximum number of traces to 100,000.

    

At this point, the extensive playout operation is complete.