pm4py package#

Process mining for Python

Subpackages#

Submodules#

pm4py.analysis module#

pm4py.analysis.construct_synchronous_product_net(trace: Trace, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking) Tuple[PetriNet, Marking, Marking][source]#

Constructs the synchronous product net between a trace and a Petri net process model.

Parameters:
  • trace (Trace) – A trace from an event log.

  • petri_net (PetriNet) – The Petri net process model.

  • initial_marking (Marking) – The initial marking of the Petri net.

  • final_marking (Marking) – The final marking of the Petri net.

Returns:

A tuple containing the synchronous Petri net, the initial marking, and the final marking.

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

net, im, fm = pm4py.read_pnml('model.pnml')
log = pm4py.read_xes('log.xes')
sync_net, sync_im, sync_fm = pm4py.construct_synchronous_product_net(log[0], net, im, fm)

Deprecated since version 2.3.0: This will be removed in 3.0.0. this method will be removed in a future release.

pm4py.analysis.compute_emd(language1: Dict[List[str], float], language2: Dict[List[str], float]) float[source]#

Computes the Earth Mover Distance (EMD) between two stochastic languages. For example, one language may be extracted from a log, and the other from a process model.

Parameters:
  • language1 – The first stochastic language.

  • language2 – The second stochastic language.

Returns:

The computed Earth Mover Distance.

Return type:

float

import pm4py

log = pm4py.read_xes('tests/input_data/running-example.xes')
language_log = pm4py.get_stochastic_language(log)
print(language_log)
net, im, fm = pm4py.read_pnml('tests/input_data/running-example.pnml')
language_model = pm4py.get_stochastic_language(net, im, fm)
print(language_model)
emd_distance = pm4py.compute_emd(language_log, language_model)
print(emd_distance)
pm4py.analysis.solve_marking_equation(petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, cost_function: Dict[Transition, float] = None) float[source]#

Solves the marking equation of a Petri net using an Integer Linear Programming (ILP) approach. An optional transition-based cost function can be provided to minimize the solution.

Parameters:
  • petri_net (PetriNet) – The Petri net.

  • initial_marking (Marking) – The initial marking of the Petri net.

  • final_marking (Marking) – The final marking of the Petri net.

  • cost_function – (Optional) A dictionary mapping transitions to their associated costs. If not provided, a default cost of 1 is assigned to each transition.

Returns:

The heuristic value obtained by solving the marking equation.

Return type:

float

import pm4py

net, im, fm = pm4py.read_pnml('model.pnml')
heuristic = pm4py.solve_marking_equation(net, im, fm)
pm4py.analysis.solve_extended_marking_equation(trace: Trace, sync_net: PetriNet, sync_im: Marking, sync_fm: Marking, split_points: List[int] | None = None) float[source]#

Computes a heuristic value (an underestimation of the cost of an alignment) between a trace and a synchronous product net using the extended marking equation with the standard cost function. For example, synchronization moves have a cost of 0, invisible moves have a cost of 1, and other moves on the model or log have a cost of 10,000. This method provides optimal provisioning of the split points.

Parameters:
  • trace (Trace) – The trace to evaluate.

  • sync_net (PetriNet) – The synchronous product net.

  • sync_im (Marking) – The initial marking of the synchronous net.

  • sync_fm (Marking) – The final marking of the synchronous net.

  • split_points – (Optional) The indices of the events in the trace to be used as split points. If not specified, the split points are identified automatically.

Returns:

The heuristic value representing the cost underestimation.

Return type:

float

import pm4py

net, im, fm = pm4py.read_pnml('model.pnml')
log = pm4py.read_xes('log.xes')
ext_mark_eq_heu = pm4py.solve_extended_marking_equation(log[0], net, im, fm)

Deprecated since version 2.3.0: This will be removed in 3.0.0. this method will be removed in a future release.

pm4py.analysis.check_soundness(petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, print_diagnostics: bool = False) Tuple[bool, Dict[str, Any]][source]#

Checks if a given Petri net is a sound Workflow net (WF-net).

A Petri net is a WF-net if and only if:
  • It has a unique source place.

  • It has a unique end place.

  • Every element in the WF-net is on a path from the source to the sink place.

A WF-net is sound if and only if:
  • It contains no live-locks.

  • It contains no deadlocks.

  • It is always possible to reach the final marking from any reachable marking.

For a formal definition of a sound WF-net, refer to: http://www.padsweb.rwth-aachen.de/wvdaalst/publications/p628.pdf

The returned tuple consists of:
  • A boolean indicating whether the Petri net is a sound WF-net.

  • A dictionary containing diagnostics collected while running WOFLAN, associating diagnostic names with their corresponding details.

Parameters:
  • petri_net (PetriNet) – The Petri net to check.

  • initial_marking (Marking) – The initial marking of the Petri net.

  • final_marking (Marking) – The final marking of the Petri net.

  • print_diagnostics (bool) – If True, additional diagnostics will be printed during the execution of WOFLAN.

Returns:

A tuple containing a boolean indicating soundness and a dictionary of diagnostics.

Return type:

Tuple[bool, Dict[str, Any]]

import pm4py

net, im, fm = pm4py.read_pnml('model.pnml')
is_sound = pm4py.check_soundness(net, im, fm)
pm4py.analysis.cluster_log(log: EventLog | EventStream | DataFrame, sklearn_clusterer=None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Generator[EventLog, None, None][source]#

Applies clustering to the provided event log by extracting profiles for the log’s traces and clustering them using a Scikit-Learn clusterer (default is K-Means with two clusters).

Parameters:
  • log – The event log to cluster.

  • sklearn_clusterer – (Optional) The Scikit-Learn clusterer to use. Default is KMeans with n_clusters=2, random_state=0, and n_init=”auto”.

  • activity_key (str) – The key used to identify activities in the log.

  • timestamp_key (str) – The key used to identify timestamps in the log.

  • case_id_key (str) – The key used to identify case IDs in the log.

Returns:

A generator that yields clustered event logs as pandas DataFrames.

Return type:

Generator[pd.DataFrame, None, None]

import pm4py

for clust_log in pm4py.cluster_log(df):
    print(clust_log)
pm4py.analysis.insert_artificial_start_end(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', artificial_start='▶', artificial_end='■') EventLog | DataFrame[source]#

Inserts artificial start and end activities into an event log or a Pandas DataFrame.

Parameters:
  • log – The event log or Pandas DataFrame to modify.

  • activity_key (str) – The attribute key used for activities.

  • timestamp_key (str) – The attribute key used for timestamps.

  • case_id_key (str) – The attribute key used to identify cases.

  • artificial_start (str) – The symbol to use for the artificial start activity.

  • artificial_end (str) – The symbol to use for the artificial end activity.

Returns:

The event log or Pandas DataFrame with artificial start and end activities inserted.

Return type:

Union[EventLog, pd.DataFrame]

import pm4py

dataframe = pm4py.insert_artificial_start_end(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.analysis.insert_case_service_waiting_time(log: EventLog | DataFrame, service_time_column: str = '@@service_time', sojourn_time_column: str = '@@sojourn_time', waiting_time_column: str = '@@waiting_time', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', start_timestamp_key: str = 'time:timestamp') DataFrame[source]#

Inserts service time, waiting time, and sojourn time information for each case into a Pandas DataFrame.

Parameters:
  • log – The event log or Pandas DataFrame to modify.

  • service_time_column (str) – The name of the column to store service times.

  • sojourn_time_column (str) – The name of the column to store sojourn times.

  • waiting_time_column (str) – The name of the column to store waiting times.

  • activity_key (str) – The attribute key used for activities.

  • timestamp_key (str) – The attribute key used for timestamps.

  • case_id_key (str) – The attribute key used to identify cases.

  • start_timestamp_key (str) – The attribute key used for the start timestamp of cases.

Returns:

A Pandas DataFrame with the inserted service, waiting, and sojourn time columns.

Return type:

pd.DataFrame

import pm4py

dataframe = pm4py.insert_case_service_waiting_time(
    dataframe,
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name',
    start_timestamp_key='time:timestamp'
)
pm4py.analysis.insert_case_arrival_finish_rate(log: EventLog | DataFrame, arrival_rate_column: str = '@@arrival_rate', finish_rate_column: str = '@@finish_rate', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', start_timestamp_key: str = 'time:timestamp') DataFrame[source]#

Inserts arrival and finish rate information for each case into a Pandas DataFrame.

The arrival rate is computed as the time difference between the start of the current case and the start of the previous case to start. The finish rate is computed as the time difference between the end of the current case and the end of the next case to finish.

Parameters:
  • log – The event log or Pandas DataFrame to modify.

  • arrival_rate_column (str) – The name of the column to store arrival rates.

  • finish_rate_column (str) – The name of the column to store finish rates.

  • activity_key (str) – The attribute key used for activities.

  • timestamp_key (str) – The attribute key used for timestamps.

  • case_id_key (str) – The attribute key used to identify cases.

  • start_timestamp_key (str) – The attribute key used for the start timestamp of cases.

Returns:

A Pandas DataFrame with the inserted arrival and finish rate columns.

Return type:

pd.DataFrame

import pm4py

dataframe = pm4py.insert_case_arrival_finish_rate(
    dataframe,
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name',
    start_timestamp_key='time:timestamp'
)
pm4py.analysis.check_is_workflow_net(net: PetriNet) bool[source]#

Checks if the input Petri net satisfies the WF-net (Workflow net) conditions: 1. It has a unique source place. 2. It has a unique sink place. 3. Every node is on a path from the source to the sink.

Parameters:

net (PetriNet) – The Petri net to check.

Returns:

True if the Petri net is a WF-net, False otherwise.

Return type:

bool

import pm4py

net = pm4py.read_pnml('model.pnml')
is_wfnet = pm4py.check_is_workflow_net(net)
pm4py.analysis.maximal_decomposition(net: PetriNet, im: Marking, fm: Marking) List[Tuple[PetriNet, Marking, Marking]][source]#

Calculates the maximal decomposition of an accepting Petri net into its maximal components.

Parameters:
  • net (PetriNet) – The Petri net to decompose.

  • im (Marking) – The initial marking of the Petri net.

  • fm (Marking) – The final marking of the Petri net.

Returns:

A list of tuples, each containing a subnet Petri net, its initial marking, and its final marking.

Return type:

List[Tuple[PetriNet, Marking, Marking]]

import pm4py

net, im, fm = pm4py.read_pnml('model.pnml')
list_nets = pm4py.maximal_decomposition(net, im, fm)
for subnet, subim, subfm in list_nets:
    pm4py.view_petri_net(subnet, subim, subfm, format='svg')
pm4py.analysis.simplicity_petri_net(net: PetriNet, im: Marking, fm: Marking, variant: str | None = 'arc_degree') float[source]#

Computes the simplicity metric for a given Petri net model.

Three available approaches are supported: - Arc Degree Simplicity: Described in the paper “ProDiGen: Mining complete, precise and minimal structure process models with a genetic algorithm.” by Vázquez-Barreiros, Borja, Manuel Mucientes, and Manuel Lama. Information Sciences, 294 (2015): 315-333. - Extended Cardoso Metric: Described in the paper “Complexity Metrics for Workflow Nets” by Lassen, Kristian Bisgaard, and Wil MP van der Aalst. - Extended Cyclomatic Metric: Also described in the paper “Complexity Metrics for Workflow Nets” by Lassen, Kristian Bisgaard, and Wil MP van der Aalst.

Parameters:
  • net (PetriNet) – The Petri net for which to compute simplicity.

  • im (Marking) – The initial marking of the Petri net.

  • fm (Marking) – The final marking of the Petri net.

  • variant – The simplicity metric variant to use (‘arc_degree’, ‘extended_cardoso’, ‘extended_cyclomatic’).

Returns:

The computed simplicity value.

Return type:

float

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
simplicity = pm4py.simplicity_petri_net(net, im, fm, variant='arc_degree')
pm4py.analysis.generate_marking(net: PetriNet, place_or_dct_places: str | Place | Dict[str, int] | Dict[Place, int]) Marking[source]#

Generates a marking for a given Petri net based on specified places and token counts.

Parameters:
  • net (PetriNet) – The Petri net for which to generate the marking.

  • place_or_dct_places – Specifies the places and their token counts for the marking. It can be: - A single PetriNet.Place object, which will have one token. - A string representing the name of a place, which will have one token. - A dictionary mapping PetriNet.Place objects to their respective number of tokens. - A dictionary mapping place names (strings) to their respective number of tokens.

Returns:

The generated Marking object.

Return type:

Marking

import pm4py

net, im, fm = pm4py.read_pnml('model.pnml')
marking = pm4py.generate_marking(net, {'source': 2})
pm4py.analysis.reduce_petri_net_invisibles(net: PetriNet) PetriNet[source]#

Reduces the number of invisible transitions in the provided Petri net.

Parameters:

net (PetriNet) – The Petri net to be reduced.

Returns:

The reduced Petri net with fewer invisible transitions.

Return type:

PetriNet

import pm4py

net, im, fm = pm4py.read_pnml('model.pnml')
net = pm4py.reduce_petri_net_invisibles(net)
pm4py.analysis.reduce_petri_net_implicit_places(net: PetriNet, im: Marking, fm: Marking) Tuple[PetriNet, Marking, Marking][source]#

Reduces the number of implicit places in the provided Petri net.

Parameters:
  • net (PetriNet) – The Petri net to be reduced.

  • im (Marking) – The initial marking of the Petri net.

  • fm (Marking) – The final marking of the Petri net.

Returns:

A tuple containing the reduced Petri net, its initial marking, and its final marking.

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

net, im, fm = pm4py.read_pnml('model.pnml')
net, im, fm = pm4py.reduce_petri_net_implicit_places(net, im, fm)
pm4py.analysis.get_enabled_transitions(net: PetriNet, marking: Marking) Set[Transition][source]#

Retrieves the set of transitions that are enabled in a given marking of a Petri net.

Parameters:
  • net (PetriNet) – The Petri net.

  • marking (Marking) – The current marking of the Petri net.

Returns:

A set of transitions that are enabled in the provided marking.

Return type:

Set[PetriNet.Transition]

import pm4py

net, im, fm = pm4py.read_pnml('tests/input_data/running-example.pnml')
# Gets the transitions enabled in the initial marking
enabled_transitions = pm4py.get_enabled_transitions(net, im)

pm4py.cli module#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

pm4py.cli.cli_interface()[source]#

pm4py.conformance module#

The pm4py.conformance module contains the conformance checking algorithms implemented in pm4py.

pm4py.conformance.conformance_diagnostics_token_based_replay(log: EventLog | DataFrame, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', return_diagnostics_dataframe: bool = False, opt_parameters: Dict[Any, Any] | None = None) List[Dict[str, Any]][source]#

Apply token-based replay for conformance checking analysis. This method returns the full token-based replay diagnostics.

Token-based replay matches a trace against a Petri net model, starting from the initial marking, to discover which transitions are executed and in which places there are remaining or missing tokens for the given process instance. Token-based replay is useful for conformance checking: a trace fits the model if, during its execution, all transitions can be fired without the need to insert any missing tokens. If reaching the final marking is imposed, a trace fits if it reaches the final marking without any missing or remaining tokens.

In PM4Py, the token replayer implementation can handle hidden transitions by calculating the shortest paths between places. It can be used with any Petri net model that has unique visible transitions and hidden transitions. When a visible transition needs to be fired and not all places in its preset have the correct number of tokens, the current marking is checked to see if any hidden transitions can be fired to enable the visible transition. The hidden transitions are then fired, reaching a marking that permits the firing of the visible transition.

The approach is described in: Berti, Alessandro, and Wil MP van der Aalst. “Reviving Token-based Replay: Increasing Speed While Improving Diagnostics.” ATAED@ Petri Nets/ACSD. 2019.

The output of the token-based replay, stored in the variable replayed_traces, contains for each trace in the log:

  • trace_is_fit: Boolean value indicating whether the trace conforms to the model.

  • activated_transitions: List of transitions activated in the model by the token-based replay.

  • reached_marking: Marking reached at the end of the replay.

  • missing_tokens: Number of missing tokens.

  • consumed_tokens: Number of consumed tokens.

  • remaining_tokens: Number of remaining tokens.

  • produced_tokens: Number of produced tokens.

Parameters:
  • log – Event log.

  • petri_net (PetriNet) – Petri net.

  • initial_marking (Marking) – Initial marking.

  • final_marking (Marking) – Final marking.

  • activity_key (str) – Attribute to be used for the activity (default is “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default is “time:timestamp”).

  • case_id_key (str) – Attribute to be used as the case identifier (default is “case:concept:name”).

  • return_diagnostics_dataframe (bool) – If possible, returns a dataframe with the diagnostics instead of the usual output (default is constants.DEFAULT_RETURN_DIAGNOSTICS_DATAFRAME).

  • opt_parameters – Optional parameters for the token-based replay, including: * reach_mark_through_hidden: Boolean to decide if the final marking should be reached through hidden transitions. * stop_immediately_unfit: Boolean to decide if the replay should stop immediately when non-conformance is detected. * walk_through_hidden_trans: Boolean to decide if the replay should walk through hidden transitions to enable visible transitions. * places_shortest_path_by_hidden: Shortest paths between places using hidden transitions. * is_reduction: Indicates if the token-based replay is called in a reduction attempt. * thread_maximum_ex_time: Maximum allowed execution time for alignment threads. * cleaning_token_flood: Decides if token flood cleaning should be performed. * disable_variants: Disable variants grouping. * return_object_names: Decide whether to return names instead of object pointers.

Returns:

A list of dictionaries containing diagnostics for each trace.

Return type:

List[Dict[str, Any]]

Example:

```python import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(

dataframe, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

) tbr_diagnostics = pm4py.conformance_diagnostics_token_based_replay(

dataframe, net, im, fm, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

pm4py.conformance.conformance_diagnostics_alignments(log: EventLog | DataFrame, *args, multi_processing: bool = False, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', variant_str: str | None = None, return_diagnostics_dataframe: bool = False, **kwargs) List[Dict[str, Any]][source]#

Apply the alignments algorithm between a log and a process model. This method returns the full alignment diagnostics.

Alignment-based replay aims to find one of the best alignments between the trace and the model. For each trace, the output of an alignment is a list of pairs where the first element is an event (from the trace) or » and the second element is a transition (from the model) or ». Each pair can be classified as follows:

  • Sync move: The event and transition labels correspond, advancing both the trace and the model simultaneously.

  • Move on log: The transition is », indicating a replay move in the trace that is not mirrored in the model. This move is unfit and signals a deviation.

  • Move on model: The event is », indicating a replay move in the model not mirrored in the trace. These can be further classified as:
    • Moves on model involving hidden transitions: Even if it’s not a sync move, the move is fit.

    • Moves on model not involving hidden transitions: The move is unfit and signals a deviation.

For each trace, a dictionary is associated containing, among other details:

  • alignment: The alignment pairs (sync moves, moves on log, moves on model).

  • cost: The cost of the alignment based on the provided cost function.

  • fitness: Equals 1 if the trace fits perfectly.

Parameters:
  • log – Event log.

  • args – Specifications of the process model.

  • multi_processing (bool) – Boolean to enable multiprocessing (default is constants.ENABLE_MULTIPROCESSING_DEFAULT).

  • activity_key (str) – Attribute to be used for the activity (default is “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default is “time:timestamp”).

  • case_id_key (str) – Attribute to be used as the case identifier (default is “case:concept:name”).

  • variant_str – Variant specification (for Petri net alignments).

  • return_diagnostics_dataframe (bool) – If possible, returns a dataframe with the diagnostics instead of the usual output (default is constants.DEFAULT_RETURN_DIAGNOSTICS_DATAFRAME).

Returns:

A list of dictionaries containing diagnostics for each trace.

Return type:

List[Dict[str, Any]]

Example:

```python import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(

dataframe, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

) alignments_diagnostics = pm4py.conformance_diagnostics_alignments(

dataframe, net, im, fm, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

pm4py.conformance.fitness_token_based_replay(log: EventLog | DataFrame, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[str, float][source]#

Calculate the fitness using token-based replay. The fitness is calculated on a log-based level. The output dictionary contains the following keys: - perc_fit_traces: Percentage of fit traces (from 0.0 to 100.0). - average_trace_fitness: Average of the trace fitnesses (between 0.0 and 1.0). - log_fitness: Overall fitness of the log (between 0.0 and 1.0). - percentage_of_fitting_traces: Percentage of fit traces (from 0.0 to 100.0).

Token-based replay matches a trace against a Petri net model, starting from the initial marking, to discover which transitions are executed and in which places there are remaining or missing tokens for the given process instance. Token-based replay is useful for conformance checking: a trace fits the model if, during its execution, all transitions can be fired without the need to insert any missing tokens. If reaching the final marking is imposed, a trace fits if it reaches the final marking without any missing or remaining tokens.

In PM4Py, the token replayer implementation can handle hidden transitions by calculating the shortest paths between places. It can be used with any Petri net model that has unique visible transitions and hidden transitions. When a visible transition needs to be fired and not all places in its preset have the correct number of tokens, the current marking is checked to see if any hidden transitions can be fired to enable the visible transition. The hidden transitions are then fired, reaching a marking that permits the firing of the visible transition.

The approach is described in: Berti, Alessandro, and Wil MP van der Aalst. “Reviving Token-based Replay: Increasing Speed While Improving Diagnostics.” ATAED@ Petri Nets/ACSD. 2019.

The calculation of replay fitness aims to assess how much of the behavior in the log is admitted by the process model. Two methods are proposed to calculate replay fitness, based on token-based replay and alignments respectively.

For token-based replay, the percentage of traces that are completely fit is returned, along with a fitness value calculated as indicated in the referenced contribution.

Parameters:
  • log – Event log.

  • petri_net (PetriNet) – Petri net.

  • initial_marking (Marking) – Initial marking.

  • final_marking (Marking) – Final marking.

  • activity_key (str) – Attribute to be used for the activity (default is “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default is “time:timestamp”).

  • case_id_key (str) – Attribute to be used as the case identifier (default is “case:concept:name”).

Returns:

A dictionary containing fitness metrics.

Return type:

Dict[str, float]

Example:

```python import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(

dataframe, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

) fitness_tbr = pm4py.fitness_token_based_replay(

dataframe, net, im, fm, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

pm4py.conformance.fitness_alignments(log: EventLog | DataFrame, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, multi_processing: bool = False, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', variant_str: str | None = None) Dict[str, float][source]#

Calculate the fitness using alignments. The output dictionary contains the following keys: - average_trace_fitness: Average of the trace fitnesses (between 0.0 and 1.0). - log_fitness: Overall fitness of the log (between 0.0 and 1.0). - percentage_of_fitting_traces: Percentage of fit traces (from 0.0 to 100.0).

Alignment-based replay aims to find one of the best alignments between the trace and the model. For each trace, the output of an alignment is a list of pairs where the first element is an event (from the trace) or » and the second element is a transition (from the model) or ». Each pair can be classified as follows:

  • Sync move: The event and transition labels correspond, advancing both the trace and the model simultaneously.

  • Move on log: The transition is », indicating a replay move in the trace that is not mirrored in the model. This move is unfit and signals a deviation.

  • Move on model: The event is », indicating a replay move in the model not mirrored in the trace. These can be further classified as:
    • Moves on model involving hidden transitions: Even if it’s not a sync move, the move is fit.

    • Moves on model not involving hidden transitions: The move is unfit and signals a deviation.

The calculation of replay fitness aims to assess how much of the behavior in the log is admitted by the process model. Two methods are proposed to calculate replay fitness, based on token-based replay and alignments respectively.

For alignments, the percentage of traces that are completely fit is returned, along with a fitness value calculated as the average of the fitness values of the individual traces.

Parameters:
  • log – Event log.

  • petri_net (PetriNet) – Petri net.

  • initial_marking (Marking) – Initial marking.

  • final_marking (Marking) – Final marking.

  • multi_processing (bool) – Boolean to enable multiprocessing (default is constants.ENABLE_MULTIPROCESSING_DEFAULT).

  • activity_key (str) – Attribute to be used for the activity (default is “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default is “time:timestamp”).

  • case_id_key (str) – Attribute to be used as the case identifier (default is “case:concept:name”).

  • variant_str – Variant specification.

Returns:

A dictionary containing fitness metrics.

Return type:

Dict[str, float]

Example:

```python import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(

dataframe, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

) fitness_alignments = pm4py.fitness_alignments(

dataframe, net, im, fm, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

pm4py.conformance.precision_token_based_replay(log: EventLog | DataFrame, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') float[source]#

Calculate precision using token-based replay.

Token-based replay matches a trace against a Petri net model, starting from the initial marking, to discover which transitions are executed and in which places there are remaining or missing tokens for the given process instance. Token-based replay is useful for conformance checking: a trace fits the model if, during its execution, all transitions can be fired without the need to insert any missing tokens. If reaching the final marking is imposed, a trace fits if it reaches the final marking without any missing or remaining tokens.

In PM4Py, the token replayer implementation can handle hidden transitions by calculating the shortest paths between places. It can be used with any Petri net model that has unique visible transitions and hidden transitions. When a visible transition needs to be fired and not all places in its preset have the correct number of tokens, the current marking is checked to see if any hidden transitions can be fired to enable the visible transition. The hidden transitions are then fired, reaching a marking that permits the firing of the visible transition.

The approach is described in: Berti, Alessandro, and Wil MP van der Aalst. “Reviving Token-based Replay: Increasing Speed While Improving Diagnostics.” ATAED@ Petri Nets/ACSD. 2019.

The reference paper for the TBR-based precision (ETConformance) is: Muñoz-Gama, Jorge, and Josep Carmona. “A fresh look at precision in process conformance.” International Conference on Business Process Management. Springer, Berlin, Heidelberg, 2010.

In this approach, the different prefixes of the log are replayed (if possible) on the model. At the reached marking, the set of transitions that are enabled in the process model is compared with the set of activities that follow the prefix. The more the sets differ, the lower the precision value. The more the sets are similar, the higher the precision value.

Parameters:
  • log – Event log.

  • petri_net (PetriNet) – Petri net.

  • initial_marking (Marking) – Initial marking.

  • final_marking (Marking) – Final marking.

  • activity_key (str) – Attribute to be used for the activity (default is “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default is “time:timestamp”).

  • case_id_key (str) – Attribute to be used as the case identifier (default is “case:concept:name”).

Returns:

The precision value.

Return type:

float

Example:

```python import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(

dataframe, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

) precision_tbr = pm4py.precision_token_based_replay(

dataframe, net, im, fm, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

pm4py.conformance.precision_alignments(log: EventLog | DataFrame, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, multi_processing: bool = False, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') float[source]#

Calculate the precision of the model with respect to the event log using alignments.

Alignment-based replay aims to find one of the best alignments between the trace and the model. For each trace, the output of an alignment is a list of pairs where the first element is an event (from the trace) or » and the second element is a transition (from the model) or ». Each pair can be classified as follows:

  • Sync move: The event and transition labels correspond, advancing both the trace and the model simultaneously.

  • Move on log: The transition is », indicating a replay move in the trace that is not mirrored in the model. This move is unfit and signals a deviation.

  • Move on model: The event is », indicating a replay move in the model not mirrored in the trace. These can be further classified as:
    • Moves on model involving hidden transitions: Even if it’s not a sync move, the move is fit.

    • Moves on model not involving hidden transitions: The move is unfit and signals a deviation.

The reference paper for the alignments-based precision (Align-ETConformance) is: Adriansyah, Arya, et al. “Measuring precision of modeled behavior.” Information systems and e-Business Management 13.1 (2015): 37-67.

In this approach, the different prefixes of the log are replayed (if possible) on the model. At the reached marking, the set of transitions that are enabled in the process model is compared with the set of activities that follow the prefix. The more the sets differ, the lower the precision value. The more the sets are similar, the higher the precision value.

Parameters:
  • log – Event log.

  • petri_net (PetriNet) – Petri net.

  • initial_marking (Marking) – Initial marking.

  • final_marking (Marking) – Final marking.

  • multi_processing (bool) – Boolean to enable multiprocessing (default is constants.ENABLE_MULTIPROCESSING_DEFAULT).

  • activity_key (str) – Attribute to be used for the activity (default is “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default is “time:timestamp”).

  • case_id_key (str) – Attribute to be used as the case identifier (default is “case:concept:name”).

Returns:

The precision value.

Return type:

float

Example:

```python import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(

dataframe, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

) precision_alignments = pm4py.precision_alignments(

dataframe, net, im, fm, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

pm4py.conformance.generalization_tbr(log: EventLog | DataFrame, petri_net: PetriNet, initial_marking: Marking, final_marking: Marking, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') float[source]#

Compute the generalization of the model against the event log. The approach is described in the paper:

Buijs, Joos CAM, Boudewijn F. van Dongen, and Wil MP van der Aalst. “Quality dimensions in process discovery: The importance of fitness, precision, generalization, and simplicity.” International Journal of Cooperative Information Systems 23.01 (2014): 1440001.

Parameters:
  • log – Event log.

  • petri_net (PetriNet) – Petri net.

  • initial_marking (Marking) – Initial marking.

  • final_marking (Marking) – Final marking.

  • activity_key (str) – Attribute to be used for the activity (default is “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default is “time:timestamp”).

  • case_id_key (str) – Attribute to be used as the case identifier (default is “case:concept:name”).

Returns:

The generalization value.

Return type:

float

Example:

```python import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(

dataframe, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

) generalization_tbr = pm4py.generalization_tbr(

dataframe, net, im, fm, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

pm4py.conformance.replay_prefix_tbr(prefix: List[str], net: PetriNet, im: Marking, fm: Marking, activity_key: str = 'concept:name') Marking[source]#

Replay a prefix (list of activities) on a given accepting Petri net using Token-Based Replay.

Parameters:
  • prefix – List of activities representing the prefix.

  • net (PetriNet) – Petri net.

  • im (Marking) – Initial marking.

  • fm (Marking) – Final marking.

  • activity_key (str) – Attribute to be used as the activity key (default is “concept:name”).

Returns:

The marking reached after replaying the prefix.

Return type:

Marking

Example:

```python import pm4py

net, im, fm = pm4py.read_pnml(‘tests/input_data/running-example.pnml’) marking = pm4py.replay_prefix_tbr(

[‘register request’, ‘check ticket’], net, im, fm, activity_key=’concept:name’

pm4py.conformance.conformance_diagnostics_footprints(*args) List[Dict[str, Any]] | Dict[str, Any][source]#

Provide conformance checking diagnostics using footprints.

Parameters:

args – Arguments where the first is an event log (or its footprints) and the others represent the process model (or its footprints).

Returns:

Conformance diagnostics based on footprints.

Return type:

Union[List[Dict[str, Any]], Dict[str, Any]]

Example:

```python import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(

dataframe, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

) footprints_diagnostics = pm4py.conformance_diagnostics_footprints(

dataframe, net, im, fm, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

Deprecated since version 2.3.0: This will be removed in 3.0.0. Conformance checking using footprints will not be exposed in a future release.

pm4py.conformance.fitness_footprints(*args) Dict[str, float][source]#

Calculate fitness using footprints. The output is a dictionary containing two keys: - perc_fit_traces: Percentage of fit traces (over the log). - log_fitness: The fitness value over the log.

Parameters:

args – Arguments where the first is an event log (or its footprints) and the others represent the process model (or its footprints).

Returns:

A dictionary containing fitness metrics based on footprints.

Return type:

Dict[str, float]

Example:

```python import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(

dataframe, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

) fitness_fp = pm4py.fitness_footprints(

dataframe, net, im, fm, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

Deprecated since version 2.3.0: This will be removed in 3.0.0. Conformance checking using footprints will not be exposed in a future release.

pm4py.conformance.precision_footprints(*args) float[source]#

Calculate precision using footprints.

Parameters:

args – Arguments where the first is an event log (or its footprints) and the others represent the process model (or its footprints).

Returns:

The precision value based on footprints.

Return type:

float

Example:

```python import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(

dataframe, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

) precision_fp = pm4py.precision_footprints(

dataframe, net, im, fm, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

Deprecated since version 2.3.0: This will be removed in 3.0.0. Conformance checking using footprints will not be exposed in a future release.

pm4py.conformance.check_is_fitting(*args, activity_key='concept:name') bool[source]#

Check if a trace object fits a process model.

Parameters:
  • args – Arguments where the first is a trace object and the others represent the process model (process tree, Petri net, BPMN).

  • activity_key (str) – Attribute to be used as the activity key (default is defined in xes_constants.DEFAULT_NAME_KEY).

Returns:

True if the trace fits the process model, False otherwise.

Return type:

bool

Note:

This is an internal method and is deprecated.

Deprecated since version 2.3.0: This will be removed in 3.0.0. This method will be removed in a future release.

pm4py.conformance.conformance_temporal_profile(log: EventLog | DataFrame, temporal_profile: Dict[Tuple[str, str], Tuple[float, float]], zeta: float = 1.0, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', return_diagnostics_dataframe: bool = False) List[List[Tuple[float, float, float, float]]][source]#

Perform conformance checking on the provided log using the provided temporal profile. The result is a list of time-based deviations for every case.

For example, consider a log with a single case: - A (timestamp: 2000-01) - B (timestamp: 2002-01)

Given the temporal profile: ```python {

(‘A’, ‘B’): (1.5, 0.5), # (mean, std) (‘A’, ‘C’): (5.0, 0.0), (‘A’, ‘D’): (2.0, 0.0)

}#

and setting zeta to 1, the difference between the timestamps of A and B (2 years) exceeds the allowed time (1.5 months + 0.5 months), resulting in a deviation.

type return_diagnostics_dataframe:

bool

type case_id_key:

str

type timestamp_key:

str

type activity_key:

str

type zeta:

float

param log:

Log object.

param temporal_profile:
Temporal profile. For example, if the log has two cases:
  • Case 1: A (timestamp: 1980-01), B (timestamp: 1980-03), C (timestamp: 1980-06)

  • Case 2: A (timestamp: 1990-01), B (timestamp: 1990-02), D (timestamp: 1990-03)

The temporal profile might look like:

```python {

(‘A’, ‘B’): (1.5, 0.5), # (mean, std) (‘A’, ‘C’): (5.0, 0.0), (‘A’, ‘D’): (2.0, 0.0)

param zeta:

Number of standard deviations allowed from the average (default is 1.0). For example, zeta=1 allows deviations within one standard deviation from the mean.

param activity_key:

Attribute to be used for the activity (default is “concept:name”).

param timestamp_key:

Attribute to be used for the timestamp (default is “time:timestamp”).

param case_id_key:

Attribute to be used as the case identifier (default is “case:concept:name”).

param return_diagnostics_dataframe:

If possible, returns a dataframe with the diagnostics instead of the usual output (default is constants.DEFAULT_RETURN_DIAGNOSTICS_DATAFRAME).

return:

A list containing lists of tuples representing time-based deviations for each case.

rtype:

List[List[Tuple[float, float, float, float]]]

Example:

```python import pm4py

temporal_profile = pm4py.discover_temporal_profile(

dataframe, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

) conformance_temporal_profile = pm4py.conformance_temporal_profile(

dataframe, temporal_profile, zeta=1, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

pm4py.conformance.conformance_declare(log: EventLog | DataFrame, declare_model: Dict[str, Dict[Any, Dict[str, int]]], activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', return_diagnostics_dataframe: bool = False) List[Dict[str, Any]][source]#

Apply conformance checking against a DECLARE model.

Reference paper: F. M. Maggi, A. J. Mooij, and W. M. P. van der Aalst, “User-guided discovery of declarative process models,” 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Paris, France, 2011, pp. 192-199, doi: 10.1109/CIDM.2011.5949297.

Parameters:
  • log – Event log.

  • declare_model – DECLARE model represented as a nested dictionary.

  • activity_key (str) – Attribute to be used for the activity (default is “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default is “time:timestamp”).

  • case_id_key (str) – Attribute to be used as the case identifier (default is “case:concept:name”).

  • return_diagnostics_dataframe (bool) – If possible, returns a dataframe with the diagnostics instead of the usual output (default is constants.DEFAULT_RETURN_DIAGNOSTICS_DATAFRAME).

Returns:

A list of dictionaries containing diagnostics for each trace.

Return type:

List[Dict[str, Any]]

Example:

```python import pm4py

log = pm4py.read_xes(“C:/receipt.xes”) declare_model = pm4py.discover_declare(log) conf_result = pm4py.conformance_declare(

log, declare_model, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

pm4py.conformance.conformance_log_skeleton(log: EventLog | DataFrame, log_skeleton: Dict[str, Any], activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', return_diagnostics_dataframe: bool = False) List[Set[Any]][source]#

Perform conformance checking using the log skeleton.

Reference paper: Verbeek, H. M. W., and R. Medeiros de Carvalho. “Log skeletons: A classification approach to process discovery.” arXiv preprint arXiv:1806.08247 (2018).

A log skeleton is a declarative model consisting of six different constraints: - directly_follows: Specifies strict bounds on activities directly following each other. For example, ‘A should be directly followed by B’ and ‘B should be directly followed by C’. - always_before: Specifies that certain activities may only be executed if some other activities have been executed earlier in the case history. For example, ‘C should always be preceded by A’. - always_after: Specifies that certain activities should always trigger the execution of other activities in the future history of the case. For example, ‘A should always be followed by C’. - equivalence: Specifies that pairs of activities should occur the same number of times within a case. For example, ‘B and C should always happen the same number of times’. - never_together: Specifies that certain pairs of activities should never occur together in the case history. For example, ‘No case should contain both C and D’. - activ_occurrences: Specifies the allowed number of occurrences per activity. For example, ‘A is allowed to be executed 1 or 2 times, and B is allowed to be executed 1 to 4 times’.

Parameters:
  • log – Log object.

  • log_skeleton – Log skeleton object, expressed as dictionaries of the six constraints along with the discovered rules.

  • activity_key (str) – Attribute to be used for the activity (default is “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default is “time:timestamp”).

  • case_id_key (str) – Attribute to be used as the case identifier (default is “case:concept:name”).

  • return_diagnostics_dataframe (bool) – If possible, returns a dataframe with the diagnostics instead of the usual output (default is constants.DEFAULT_RETURN_DIAGNOSTICS_DATAFRAME).

Returns:

A list of sets containing deviations for each case.

Return type:

List[Set[Any]]

Example:

```python import pm4py

log_skeleton = pm4py.discover_log_skeleton(

dataframe, noise_threshold=0.1, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

) conformance_lsk = pm4py.conformance_log_skeleton(

dataframe, log_skeleton, activity_key=’concept:name’, case_id_key=’case:concept:name’, timestamp_key=’time:timestamp’

pm4py.connectors module#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

pm4py.connectors.extract_log_outlook_mails() DataFrame[source]#

Extracts the history of conversations from the local instance of Microsoft Outlook running on the current computer.

Columns: - CASE ID (case:concept:name): Identifier of the conversation. - ACTIVITY (concept:name): Activity performed in the current item (e.g., send e-mail, receive e-mail, refuse meeting). - TIMESTAMP (time:timestamp): Timestamp of creation of the item in Outlook. - RESOURCE (org:resource): Sender of the current item.

See also: * [MailItem Properties](https://learn.microsoft.com/en-us/dotnet/api/microsoft.office.interop.outlook.mailitem?redirectedfrom=MSDN&view=outlook-pia#properties_) * [OlObjectClass Enumeration](https://learn.microsoft.com/en-us/dotnet/api/microsoft.office.interop.outlook.olobjectclass?view=outlook-pia)

Return type:

pd.DataFrame

pm4py.connectors.extract_log_outlook_calendar(email_user: str | None = None, calendar_id: int = 9) DataFrame[source]#

Extracts the history of calendar events (creation, update, start, end) into a Pandas DataFrame from the local Outlook instance running on the current computer.

Columns: - CASE ID (case:concept:name): Identifier of the meeting. - ACTIVITY (concept:name): One of the following activities: Meeting Created, Last Change of Meeting, Meeting Started, Meeting Completed. - TIMESTAMP (time:timestamp): Timestamp of the event. - case:subject: Subject of the meeting.

Parameters:
  • email_user – (optional) E-mail address from which the (shared) calendar should be extracted.

  • calendar_id (int) – Identifier of the calendar for the given user (default: 9).

Return type:

pd.DataFrame

pm4py.connectors.extract_log_windows_events() DataFrame[source]#

Extracts a process mining DataFrame from all events recorded in the Windows registry.

Columns: - CASE ID (case:concept:name): Name of the computer emitting the events. - ACTIVITY (concept:name): Concatenation of the source name of the event and the event identifier.

  • TIMESTAMP (time:timestamp): Timestamp of event generation.

  • RESOURCE (org:resource): Username involved in the event.

Return type:

pd.DataFrame

pm4py.connectors.extract_log_chrome_history(history_db_path: str | None = None) DataFrame[source]#

Extracts a DataFrame containing the navigation history of Google Chrome. Please ensure that Google Chrome history is closed when extracting.

Columns: - CASE ID (case:concept:name): Identifier of the extracted profile. - ACTIVITY (concept:name): Complete path of the website, excluding GET arguments. - TIMESTAMP (time:timestamp): Timestamp of the visit.

Parameters:

history_db_path – Path to the Google Chrome history database (default: location of the Windows folder).

Return type:

pd.DataFrame

pm4py.connectors.extract_log_firefox_history(history_db_path: str | None = None) DataFrame[source]#

Extracts a DataFrame containing the navigation history of Mozilla Firefox. Please ensure that Mozilla Firefox history is closed when extracting.

Columns: - CASE ID (case:concept:name): Identifier of the extracted profile. - ACTIVITY (concept:name): Complete path of the website, excluding GET arguments. - TIMESTAMP (time:timestamp): Timestamp of the visit.

Parameters:

history_db_path – Path to the Mozilla Firefox history database (default: location of the Windows folder).

Return type:

pd.DataFrame

pm4py.connectors.extract_log_github(owner: str = 'pm4py', repo: str = 'pm4py-core', auth_token: str | None = None) DataFrame[source]#

Extracts a DataFrame containing the history of issues from a GitHub repository. Due to API rate limits for public and registered users, only a subset of events may be returned.

Parameters:
  • owner (str) – Owner of the repository (e.g., pm4py).

  • repo (str) – Name of the repository (e.g., pm4py-core).

  • auth_token – Authorization token.

Return type:

pd.DataFrame

pm4py.connectors.extract_log_camunda_workflow(connection_string: str) DataFrame[source]#

Extracts a DataFrame from the Camunda workflow system. In addition to traditional columns, the process ID of the process in Camunda is included.

Parameters:

connection_string (str) – ODBC connection string to the Camunda database.

Return type:

pd.DataFrame

pm4py.connectors.extract_log_sap_o2c(connection_string: str, prefix: str = '') DataFrame[source]#

Extracts a DataFrame for the SAP Order-to-Cash (O2C) process.

Parameters:
  • connection_string (str) – ODBC connection string to the SAP database.

  • prefix (str) – Prefix for the tables (e.g., SAPSR3.).

Return type:

pd.DataFrame

pm4py.connectors.extract_log_sap_accounting(connection_string: str, prefix: str = '') DataFrame[source]#

Extracts a DataFrame for the SAP Accounting process.

Parameters:
  • connection_string (str) – ODBC connection string to the SAP database.

  • prefix (str) – Prefix for the tables (e.g., SAPSR3.).

Return type:

pd.DataFrame

pm4py.connectors.extract_ocel_outlook_mails() OCEL[source]#

Extracts the history of conversations from the local instance of Microsoft Outlook running on the current computer as an object-centric event log.

Columns: - ACTIVITY (ocel:activity): Activity performed in the current item (e.g., send e-mail, receive e-mail, refuse meeting). - TIMESTAMP (ocel:timestamp): Timestamp of creation of the item in Outlook.

Object Types: - org:resource: Sender of the mail. - recipients: List of recipients of the mail. - topic: Topic of the discussion.

See also: * [MailItem Properties](https://learn.microsoft.com/en-us/dotnet/api/microsoft.office.interop.outlook.mailitem?redirectedfrom=MSDN&view=outlook-pia#properties_) * [OlObjectClass Enumeration](https://learn.microsoft.com/en-us/dotnet/api/microsoft.office.interop.outlook.olobjectclass?view=outlook-pia)

Return type:

OCEL

pm4py.connectors.extract_ocel_outlook_calendar(email_user: str | None = None, calendar_id: int = 9) OCEL[source]#

Extracts the history of calendar events (creation, update, start, end) as an object-centric event log from the local Outlook instance running on the current computer.

Columns: - ACTIVITY (ocel:activity): One of the following activities: Meeting Created, Last Change of Meeting, Meeting Started, Meeting Completed. - TIMESTAMP (ocel:timestamp): Timestamp of the event.

Object Types: - case:concept:name: Identifier of the meeting. - case:subject: Subject of the meeting.

Parameters:
  • email_user – (optional) E-mail address from which the (shared) calendar should be extracted.

  • calendar_id (int) – Identifier of the calendar for the given user (default: 9).

Return type:

OCEL

pm4py.connectors.extract_ocel_windows_events() OCEL[source]#

Extracts an object-centric event log from all events recorded in the Windows registry.

Columns: - ACTIVITY (ocel:activity): Concatenation of the source name of the event and the event identifier.

  • TIMESTAMP (ocel:timestamp): Timestamp of event generation.

Object Types: - categoryString: Translation of the subcategory. The translation is source-specific. - computerName: Name of the computer that generated the event. - eventIdentifier: Identifier of the event, specific to the source that generated the event log entry. - eventType: Event type classification (1=Error; 2=Warning; 3=Information; 4=Security Audit Success; 5=Security Audit Failure). - sourceName: Name of the source (application, service, driver, or subsystem) that generated the entry. - user: Username of the logged-on user when the event occurred. If the username cannot be determined, this will be NULL.

Return type:

OCEL

pm4py.connectors.extract_ocel_chrome_history(history_db_path: str | None = None) OCEL[source]#

Extracts an object-centric event log containing the navigation history of Google Chrome. Please ensure that Google Chrome history is closed when extracting.

Columns: - ACTIVITY (ocel:activity): Complete path of the website, excluding GET arguments. - TIMESTAMP (ocel:timestamp): Timestamp of the visit.

Object Types: - case:concept:name: Profile of Chrome used to visit the site. - complete_url: Complete URL of the website. - url_wo_parameters: Complete URL excluding the part after ‘?’. - domain: Domain of the visited website.

Parameters:

history_db_path – Path to the Google Chrome history database (default: location of the Windows folder).

Return type:

OCEL

pm4py.connectors.extract_ocel_firefox_history(history_db_path: str | None = None) OCEL[source]#

Extracts an object-centric event log containing the navigation history of Mozilla Firefox. Please ensure that Mozilla Firefox history is closed when extracting.

Columns: - ACTIVITY (ocel:activity): Complete path of the website, excluding GET arguments. - TIMESTAMP (ocel:timestamp): Timestamp of the visit.

Object Types: - case:concept:name: Profile of Firefox used to visit the site. - complete_url: Complete URL of the website. - url_wo_parameters: Complete URL excluding the part after ‘?’. - domain: Domain of the visited website.

Parameters:

history_db_path – Path to the Mozilla Firefox history database (default: location of the Windows folder).

Return type:

OCEL

pm4py.connectors.extract_ocel_github(owner: str = 'pm4py', repo: str = 'pm4py-core', auth_token: str | None = None) OCEL[source]#

Extracts an object-centric event log containing the history of issues from a GitHub repository. Due to API rate limits for public and registered users, only a subset of events may be returned.

Columns: - ACTIVITY (ocel:activity): The event type (e.g., created, commented, closed, subscribed). - TIMESTAMP (ocel:timestamp): Timestamp of the event execution.

Object Types: - case:concept:name: URL of the events related to the issue. - org:resource: Involved resource. - case:repo: Repository in which the issue was created.

Parameters:
  • owner (str) – Owner of the repository (e.g., pm4py).

  • repo (str) – Name of the repository (e.g., pm4py-core).

  • auth_token – Authorization token.

Return type:

OCEL

pm4py.connectors.extract_ocel_camunda_workflow(connection_string: str) OCEL[source]#

Extracts an object-centric event log from the Camunda workflow system.

Columns: - ACTIVITY (ocel:activity): Activity performed within Camunda. - TIMESTAMP (ocel:timestamp): Timestamp of the activity execution.

Object Types: - case:concept:name: Identifier of the case. - processID: Process ID within Camunda. - org:resource: Resource involved in the activity.

Parameters:

connection_string (str) – ODBC connection string to the Camunda database.

Return type:

OCEL

pm4py.connectors.extract_ocel_sap_o2c(connection_string: str, prefix: str = '') OCEL[source]#

Extracts an object-centric event log for the SAP Order-to-Cash (O2C) process.

Columns: - ACTIVITY (ocel:activity): Activity performed in the O2C process. - TIMESTAMP (ocel:timestamp): Timestamp of the activity execution.

Object Types: - case:concept:name: Identifier of the case. - org:resource: Resource involved in the activity.

Parameters:
  • connection_string (str) – ODBC connection string to the SAP database.

  • prefix (str) – Prefix for the tables (e.g., SAPSR3.).

Return type:

OCEL

pm4py.connectors.extract_ocel_sap_accounting(connection_string: str, prefix: str = '') OCEL[source]#

Extracts an object-centric event log for the SAP Accounting process.

Columns: - ACTIVITY (ocel:activity): Activity performed in the Accounting process. - TIMESTAMP (ocel:timestamp): Timestamp of the activity execution.

Object Types: - case:concept:name: Identifier of the case. - org:resource: Resource involved in the activity.

Parameters:
  • connection_string (str) – ODBC connection string to the SAP database.

  • prefix (str) – Prefix for the tables (e.g., SAPSR3.).

Return type:

OCEL

pm4py.convert module#

The pm4py.convert module contains the cross-conversions implemented in pm4py

pm4py.convert.convert_to_event_log(obj: DataFrame | EventStream, case_id_key: str = 'case:concept:name', **kwargs) EventLog[source]#

Converts a DataFrame or EventStream object to an event log object.

Return type:

EventLog

Parameters:
  • obj – The DataFrame or EventStream object to convert.

  • case_id_key (str) – The attribute to be used as the case identifier. Defaults to “case:concept:name”.

  • kwargs – Additional keyword arguments to pass to the converter.

Returns:

An EventLog object.

import pandas as pd
import pm4py

dataframe = pm4py.read_csv("tests/input_data/running-example.csv")
dataframe = pm4py.format_dataframe(dataframe, case_id_column='case:concept:name', activity_column='concept:name', timestamp_column='time:timestamp')
log = pm4py.convert_to_event_log(dataframe)
pm4py.convert.convert_to_event_stream(obj: EventLog | DataFrame, case_id_key: str = 'case:concept:name', **kwargs) EventStream[source]#

Converts a log object or DataFrame to an event stream.

Return type:

EventStream

Parameters:
  • obj – The log object (EventLog) or DataFrame to convert.

  • case_id_key (str) – The attribute to be used as the case identifier. Defaults to “case:concept:name”.

  • kwargs – Additional keyword arguments to pass to the converter.

Returns:

An EventStream object.

import pm4py

log = pm4py.read_xes("tests/input_data/running-example.xes")
event_stream = pm4py.convert_to_event_stream(log)
pm4py.convert.convert_to_dataframe(obj: EventStream | EventLog, **kwargs) DataFrame[source]#

Converts a log object (EventStream or EventLog) to a Pandas DataFrame.

Return type:

DataFrame

Parameters:
  • obj – The log object to convert.

  • kwargs – Additional keyword arguments to pass to the converter.

Returns:

A pd.DataFrame object.

import pm4py

log = pm4py.read_xes("tests/input_data/running-example.xes")
dataframe = pm4py.convert_to_dataframe(log)
pm4py.convert.convert_to_bpmn(*args: Tuple[PetriNet, Marking, Marking] | ProcessTree) BPMN[source]#

Converts an object to a BPMN diagram.

As input, either a Petri net (with corresponding initial and final markings) or a process tree can be provided. A process tree can always be converted into a BPMN model, ensuring the quality of the resulting object. For Petri nets, the quality of the conversion largely depends on the net provided (e.g., sound WF-nets are likely to produce reasonable BPMN models).

Return type:

BPMN

Parameters:

args

  • If converting a Petri net: a tuple of (PetriNet, Marking, Marking).

  • If converting a process tree: a single ProcessTree object.

Returns:

A BPMN object.

import pm4py

# Import a Petri net from a file
net, im, fm = pm4py.read_pnml("tests/input_data/running-example.pnml")
bpmn_graph = pm4py.convert_to_bpmn(net, im, fm)
pm4py.convert.convert_to_petri_net(*args: BPMN | ProcessTree | HeuristicsNet | POWL | dict) Tuple[PetriNet, Marking, Marking][source]#

Converts an input model to an (accepting) Petri net.

The input objects can be a process tree, BPMN model, Heuristic net, POWL model, or a dictionary representing a Directly-Follows Graph (DFG). The output is a tuple containing the Petri net and the initial and final markings. The markings are only returned if they can be reasonably derived from the input model.

Parameters:

args

  • If converting from a BPMN, ProcessTree, HeuristicsNet, or POWL: a single object of the respective type.

  • If converting from a DFG: a dictionary representing the DFG, followed by lists of start and end activities.

Returns:

A tuple of (PetriNet, Marking, Marking).

import pm4py

# Imports a process tree from a PTML file
process_tree = pm4py.read_ptml("tests/input_data/running-example.ptml")
net, im, fm = pm4py.convert_to_petri_net(process_tree)
pm4py.convert.convert_to_process_tree(*args: Tuple[PetriNet, Marking, Marking] | BPMN | ProcessTree) ProcessTree[source]#

Converts an input model to a process tree.

The input models can be Petri nets (with markings) or BPMN models. For both input types, the conversion is not guaranteed to work and may raise an exception.

Return type:

ProcessTree

Parameters:

args

  • If converting from a Petri net: a tuple of (PetriNet, Marking, Marking).

  • If converting from a BPMN or ProcessTree: a single object of the respective type.

Returns:

A ProcessTree object.

import pm4py

# Imports a BPMN file
bpmn_graph = pm4py.read_bpmn("tests/input_data/running-example.bpmn")
# Converts the BPMN to a process tree (through intermediate conversion to a Petri net)
process_tree = pm4py.convert_to_process_tree(bpmn_graph)
pm4py.convert.convert_to_reachability_graph(*args: Tuple[PetriNet, Marking, Marking] | BPMN | ProcessTree) TransitionSystem[source]#

Converts an input model to a reachability graph (transition system).

The input models can be Petri nets (with markings), BPMN models, or process trees. The output is the state-space of the model, encoded as a TransitionSystem object.

Return type:

TransitionSystem

Parameters:

args

  • If converting from a Petri net: a tuple of (PetriNet, Marking, Marking).

  • If converting from a BPMN or ProcessTree: a single object of the respective type.

Returns:

A TransitionSystem object.

import pm4py

# Reads a Petri net from a file
net, im, fm = pm4py.read_pnml("tests/input_data/running-example.pnml")
# Converts it to a reachability graph
reach_graph = pm4py.convert_to_reachability_graph(net, im, fm)
pm4py.convert.convert_log_to_ocel(log: EventLog | EventStream | DataFrame, activity_column: str = 'concept:name', timestamp_column: str = 'time:timestamp', object_types: Collection[str] | None = None, obj_separator: str = ' AND ', additional_event_attributes: Collection[str] | None = None, additional_object_attributes: Dict[str, Collection[str]] | None = None) OCEL[source]#

Converts an event log to an object-centric event log (OCEL) with one or more object types.

Return type:

OCEL

Parameters:
  • log – The log object to convert.

  • activity_column (str) – The name of the column representing activities.

  • timestamp_column (str) – The name of the column representing timestamps.

  • object_types – A collection of column names to consider as object types. If None, defaults are used.

  • obj_separator (str) – The separator used between different objects in the same column. Defaults to “ AND “.

  • additional_event_attributes – Additional attribute names to include as event attributes in the OCEL.

  • additional_object_attributes – Additional attributes per object type to include as object attributes in the OCEL. Should be a dictionary mapping object types to lists of attribute names.

Returns:

An OCEL object.

pm4py.convert.convert_ocel_to_networkx(ocel: OCEL, variant: str = 'ocel_to_nx') DiGraph[source]#

Converts an OCEL to a NetworkX DiGraph object.

Return type:

DiGraph

Parameters:
  • ocel (OCEL) – The object-centric event log to convert.

  • variant (str) – The variant of the conversion to use. Options: - “ocel_to_nx”: Graph containing event and object IDs and two types of relations (REL=related objects, DF=directly-follows). - “ocel_features_to_nx”: Graph containing different types of interconnections at the object level.

Returns:

A nx.DiGraph object representing the OCEL.

pm4py.convert.convert_log_to_networkx(log: EventLog | EventStream | DataFrame, include_df: bool = True, case_id_key: str = 'concept:name', other_case_attributes_as_nodes: Collection[str] | None = None, event_attributes_as_nodes: Collection[str] | None = None) DiGraph[source]#

Converts an event log to a NetworkX DiGraph object.

The nodes of the graph include events, cases, and optionally log attributes. The edges represent: - BELONGS_TO: Connecting each event to its corresponding case. - DF: Connecting events that directly follow each other (if enabled). - ATTRIBUTE_EDGE: Connecting cases/events to their attribute values.

Return type:

DiGraph

Parameters:
  • log – The log object to convert (EventLog, EventStream, or Pandas DataFrame).

  • include_df (bool) – Whether to include the directly-follows relation in the graph. Defaults to True.

  • case_id_key (str) – The attribute to be used as the case identifier. Defaults to “concept:name”.

  • other_case_attributes_as_nodes – Attributes at the case level to include as nodes, excluding the case ID.

  • event_attributes_as_nodes – Attributes at the event level to include as nodes.

Returns:

A nx.DiGraph object representing the event log.

pm4py.convert.convert_log_to_time_intervals(log: EventLog | DataFrame, filter_activity_couple: Tuple[str, str] | None = None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', start_timestamp_key: str = 'time:timestamp') List[List[Any]][source]#

Extracts a list of time intervals from an event log.

Each interval contains two temporally consecutive events within the same case and measures the time between them (complete timestamp of the first event against the start timestamp of the second event).

Parameters:
  • log – The log object to convert.

  • filter_activity_couple – Optional tuple to filter intervals by a specific pair of activities.

  • activity_key (str) – The attribute to be used as the activity identifier. Defaults to “concept:name”.

  • timestamp_key (str) – The attribute to be used as the timestamp. Defaults to “time:timestamp”.

  • case_id_key (str) – The attribute to be used as the case identifier. Defaults to “case:concept:name”.

  • start_timestamp_key (str) – The attribute to be used as the start timestamp in the interval. Defaults to “time:timestamp”.

Returns:

A list of intervals, where each interval is a list containing relevant information about the time gap.

import pm4py

log = pm4py.read_xes('tests/input_data/receipt.xes')
time_intervals = pm4py.convert_log_to_time_intervals(log)
print(len(time_intervals))
time_intervals = pm4py.convert_log_to_time_intervals(
    log, 
    filter_activity_couple=('Confirmation of receipt', 'T02 Check confirmation of receipt')
)
print(len(time_intervals))
pm4py.convert.convert_petri_net_to_networkx(net: PetriNet, im: Marking, fm: Marking) DiGraph[source]#

Converts a Petri net to a NetworkX DiGraph.

Each place and transition in the Petri net is represented as a node in the graph.

Return type:

DiGraph

Parameters:
  • net (PetriNet) – The Petri net to convert.

  • im (Marking) – The initial marking of the Petri net.

  • fm (Marking) – The final marking of the Petri net.

Returns:

A nx.DiGraph object representing the Petri net.

pm4py.convert.convert_petri_net_type(net: PetriNet, im: Marking, fm: Marking, type: str = 'classic') Tuple[PetriNet, Marking, Marking][source]#

Changes the internal type of a Petri net.

Supports conversion to different Petri net types such as classic, reset, inhibitor, and reset_inhibitor nets.

Parameters:
  • net (PetriNet) – The Petri net to convert.

  • im (Marking) – The initial marking of the Petri net.

  • fm (Marking) – The final marking of the Petri net.

  • type (str) – The target Petri net type. Options are “classic”, “reset”, “inhibitor”, “reset_inhibitor”. Defaults to “classic”.

Returns:

A tuple of the converted (PetriNet, Marking, Marking).

pm4py.discovery module#

The pm4py.discovery module contains the process discovery algorithms implemented in pm4py.

pm4py.discovery.discover_dfg(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[dict, dict, dict][source]#

Discovers a Directly-Follows Graph (DFG) from a log.

This method returns a tuple containing: - A dictionary with pairs of directly-following activities as keys and the frequency of the relationship as values. - A dictionary of start activities with their respective frequencies. - A dictionary of end activities with their respective frequencies.

Parameters:
  • log – Event log or Pandas DataFrame.

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A tuple of three dictionaries: (dfg, start_activities, end_activities).

Return type:

Tuple[dict, dict, dict]

import pm4py

dfg, start_activities, end_activities = pm4py.discover_dfg(
    dataframe,
    case_id_key='case:concept:name',
    activity_key='concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_directly_follows_graph(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[dict, dict, dict][source]#
pm4py.discovery.discover_dfg_typed(log: DataFrame, case_id_key: str = 'case:concept:name', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp') DirectlyFollowsGraph[source]#

Discovers a typed Directly-Follows Graph (DFG) from a log.

This method returns a typed DFG object, as specified in pm4py.objects.dfg.obj.py (DirectlyFollowsGraph Class). The DFG object includes the graph, start activities, and end activities. - The graph is a collection of triples of the form (a, b, f) representing an arc a->b with frequency f. - The start activities are a collection of tuples of the form (a, f) representing that activity a starts f cases. - The end activities are a collection of tuples of the form (a, f) representing that activity a ends f cases.

This method replaces pm4py.discover_dfg and pm4py.discover_directly_follows_graph. In future releases, these functions will adopt the same behavior as this function.

Parameters:
  • log (DataFrame) – pandas.DataFrame

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

Returns:

A typed DFG object containing the graph, start activities, and end activities.

Return type:

DFG

import pm4py

dfg = pm4py.discover_dfg_typed(
    log,
    case_id_key='case:concept:name',
    activity_key='concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_performance_dfg(log: EventLog | DataFrame, business_hours: bool = False, business_hour_slots=[(25200, 61200), (111600, 147600), (198000, 234000), (284400, 320400), (370800, 406800)], workcalendar=None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[dict, dict, dict][source]#

Discovers a Performance Directly-Follows Graph from an event log.

This method returns a tuple containing: - A dictionary with pairs of directly-following activities as keys and the performance metrics of the relationship as values. - A dictionary of start activities with their respective frequencies. - A dictionary of end activities with their respective frequencies.

Parameters:
  • log – Event log or Pandas DataFrame.

  • business_hours (bool) – Enables or disables computation based on business hours (default: False).

  • business_hour_slots

    Work schedule of the company, provided as a list of tuples where each tuple represents one time slot of business hours. Each slot consists of a start and end time given in seconds since the week start. Example: ```python [

    (7 * 60 * 60, 17 * 60 * 60), # Monday 07:00 - 17:00 ((24 + 7) * 60 * 60, (24 + 12) * 60 * 60), # Tuesday 07:00 - 12:00 ((24 + 13) * 60 * 60, (24 + 17) * 60 * 60) # Tuesday 13:00 - 17:00

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A tuple of three dictionaries: (performance_dfg, start_activities, end_activities).

Return type:

Tuple[dict, dict, dict]

import pm4py

performance_dfg, start_activities, end_activities = pm4py.discover_performance_dfg(
    dataframe,
    case_id_key='case:concept:name',
    activity_key='concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_petri_net_alpha(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[PetriNet, Marking, Marking][source]#

Discovers a Petri net using the Alpha Miner.

Parameters:
  • log – Event log or Pandas DataFrame.

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A tuple containing the Petri net, initial marking, and final marking.

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

net, im, fm = pm4py.discover_petri_net_alpha(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_petri_net_ilp(log: EventLog | DataFrame, alpha: float = 1.0, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[PetriNet, Marking, Marking][source]#

Discovers a Petri net using the ILP Miner.

Parameters:
  • log – Event log or Pandas DataFrame.

  • alpha (float) – Noise threshold for the sequence encoding graph (1.0=no filtering, 0.0=maximum filtering) (default: 1.0).

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A tuple containing the Petri net, initial marking, and final marking.

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

net, im, fm = pm4py.discover_petri_net_ilp(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_petri_net_alpha_plus(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[PetriNet, Marking, Marking][source]#

Discovers a Petri net using the Alpha+ algorithm.

Deprecated since version 2.3.0: This method will be removed in version 3.0.0. Use other discovery methods instead.

Parameters:
  • log – Event log or Pandas DataFrame.

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A tuple containing the Petri net, initial marking, and final marking.

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

net, im, fm = pm4py.discover_petri_net_alpha_plus(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)

Deprecated since version 2.3.0: This will be removed in 3.0.0. This method will be removed in a future release.

pm4py.discovery.discover_petri_net_inductive(log: EventLog | DataFrame | DirectlyFollowsGraph, multi_processing: bool = False, noise_threshold: float = 0.0, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', disable_fallthroughs: bool = False) Tuple[PetriNet, Marking, Marking][source]#

Discovers a Petri net using the Inductive Miner algorithm.

The Inductive Miner detects a ‘cut’ in the log (e.g., sequential, parallel, concurrent, loop) and recursively applies the algorithm to sublogs until a base case is found. Inductive miner models typically use hidden transitions for skipping or looping portions of the model, and each visible transition has a unique label.

Parameters:
  • log – Event log, Pandas DataFrame, or typed DFG.

  • multi_processing (bool) – Enables or disables multiprocessing in the Inductive Miner (default: constants.ENABLE_MULTIPROCESSING_DEFAULT).

  • noise_threshold (float) – Noise threshold (default: 0.0).

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

  • disable_fallthroughs (bool) – Disables the Inductive Miner fall-throughs (default: False).

Returns:

A tuple containing the Petri net, initial marking, and final marking.

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

net, im, fm = pm4py.discover_petri_net_inductive(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_petri_net_heuristics(log: EventLog | DataFrame, dependency_threshold: float = 0.5, and_threshold: float = 0.65, loop_two_threshold: float = 0.5, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[PetriNet, Marking, Marking][source]#

Discovers a Petri net using the Heuristics Miner.

Heuristics Miner operates on the Directly-Follows Graph, handling noise and identifying common constructs such as dependencies between activities and parallelism. The output is a Heuristics Net, which can then be converted into a Petri net.

Parameters:
  • log – Event log or Pandas DataFrame.

  • dependency_threshold (float) – Dependency threshold (default: 0.5).

  • and_threshold (float) – AND threshold for parallelism (default: 0.65).

  • loop_two_threshold (float) – Loop two threshold (default: 0.5).

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A tuple containing the Petri net, initial marking, and final marking.

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

net, im, fm = pm4py.discover_petri_net_heuristics(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_process_tree_inductive(log: EventLog | DataFrame | DirectlyFollowsGraph, noise_threshold: float = 0.0, multi_processing: bool = False, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', disable_fallthroughs: bool = False) ProcessTree[source]#

Discovers a Process Tree using the Inductive Miner algorithm.

The Inductive Miner detects a ‘cut’ in the log (e.g., sequential, parallel, concurrent, loop) and recursively applies the algorithm to sublogs until a base case is found. Inductive miner models typically use hidden transitions for skipping or looping portions of the model, and each visible transition has a unique label.

Parameters:
  • log – Event log, Pandas DataFrame, or typed DFG.

  • noise_threshold (float) – Noise threshold (default: 0.0).

  • multi_processing (bool) – Enables or disables multiprocessing in the Inductive Miner (default: constants.ENABLE_MULTIPROCESSING_DEFAULT).

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

  • disable_fallthroughs (bool) – Disables the Inductive Miner fall-throughs (default: False).

Returns:

A ProcessTree object.

Return type:

ProcessTree

import pm4py

process_tree = pm4py.discover_process_tree_inductive(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_heuristics_net(log: EventLog | DataFrame, dependency_threshold: float = 0.5, and_threshold: float = 0.65, loop_two_threshold: float = 0.5, min_act_count: int = 1, min_dfg_occurrences: int = 1, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', decoration: str = 'frequency') HeuristicsNet[source]#

Discovers a Heuristics Net.

Heuristics Miner operates on the Directly-Follows Graph, handling noise and identifying common constructs such as dependencies between activities and parallelism. The output is a Heuristics Net, which can then be converted into a Petri net.

Parameters:
  • log – Event log or Pandas DataFrame.

  • dependency_threshold (float) – Dependency threshold (default: 0.5).

  • and_threshold (float) – AND threshold for parallelism (default: 0.65).

  • loop_two_threshold (float) – Loop two threshold (default: 0.5).

  • min_act_count (int) – Minimum number of occurrences per activity to be included in the discovery (default: 1).

  • min_dfg_occurrences (int) – Minimum number of occurrences per arc in the DFG to be included in the discovery (default: 1).

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

  • decoration (str) – The decoration to be used (“frequency” or “performance”) (default: “frequency”).

Returns:

A HeuristicsNet object.

Return type:

HeuristicsNet

import pm4py

heu_net = pm4py.discover_heuristics_net(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.derive_minimum_self_distance(log: DataFrame | EventLog | EventStream, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[str, int][source]#

Computes the minimum self-distance for each activity observed in an event log.

The self-distance of activity a in <a> is infinity, in <a, a> is 0, in <a, b, a> is 1, etc. The activity key ‘concept:name’ is used.

Parameters:
  • log – Event log or Pandas DataFrame.

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A dictionary mapping each activity to its minimum self-distance.

Return type:

Dict[str, int]

import pm4py

msd = pm4py.derive_minimum_self_distance(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_footprints(*args: EventLog | Tuple[PetriNet, Marking, Marking] | ProcessTree) List[Dict[str, Any]] | Dict[str, Any][source]#

Discovers the footprints from the provided event log or process model.

Footprints are a high-level representation of the behavior captured in the event log or process model.

Parameters:

args – Event log, process model (Petri net and markings), or ProcessTree.

Returns:

A list of footprint dictionaries or a single footprint dictionary.

Return type:

Union[List[Dict[str, Any]], Dict[str, Any]]

import pm4py

footprints = pm4py.discover_footprints(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_eventually_follows_graph(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[Tuple[str, str], int][source]#

Generates the Eventually-Follows Graph from a log.

The Eventually-Follows Graph is a dictionary that maps each pair of activities to the number of times one activity eventually follows the other in the log.

Parameters:
  • log – Event log or Pandas DataFrame.

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A dictionary mapping each pair of activities to the count of their eventually-follows relationship.

Return type:

Dict[Tuple[str, str], int]

import pm4py

efg = pm4py.discover_eventually_follows_graph(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_bpmn_inductive(log: EventLog | DataFrame | DirectlyFollowsGraph, noise_threshold: float = 0.0, multi_processing: bool = False, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', disable_fallthroughs: bool = False) BPMN[source]#

Discovers a BPMN model using the Inductive Miner algorithm.

The Inductive Miner detects a ‘cut’ in the log (e.g., sequential, parallel, concurrent, loop) and recursively applies the algorithm to sublogs until a base case is found. Inductive miner models typically use hidden transitions for skipping or looping portions of the model, and each visible transition has a unique label.

Parameters:
  • log – Event log, Pandas DataFrame, or typed DFG.

  • noise_threshold (float) – Noise threshold (default: 0.0).

  • multi_processing (bool) – Enables or disables multiprocessing in the Inductive Miner (default: constants.ENABLE_MULTIPROCESSING_DEFAULT).

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

  • disable_fallthroughs (bool) – Disables the Inductive Miner fall-throughs (default: False).

Returns:

A BPMN object representing the discovered BPMN model.

Return type:

BPMN

import pm4py

bpmn_graph = pm4py.discover_bpmn_inductive(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_transition_system(log: EventLog | DataFrame, direction: str = 'forward', window: int = 2, view: str = 'sequence', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') TransitionSystem[source]#

Discovers a Transition System from a log.

The Transition System is built based on the specified direction, window size, and view. It captures the transitions between states of activity sequences.

Parameters:
  • log – Event log or Pandas DataFrame.

  • direction (str) – Direction in which the transition system is built (“forward” or “backward”) (default: “forward”).

  • window (int) – Window size for state construction (e.g., 2, 3) (default: 2).

  • view (str) – View to use in the construction of the states (“sequence”, “set”, “multiset”) (default: “sequence”).

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A TransitionSystem object representing the discovered transition system.

Return type:

TransitionSystem

import pm4py

transition_system = pm4py.discover_transition_system(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_prefix_tree(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Trie[source]#

Discovers a Prefix Tree from the provided log.

A Prefix Tree represents all the unique prefixes of activity sequences in the log.

Parameters:
  • log – Event log or Pandas DataFrame.

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A Trie object representing the discovered prefix tree.

Return type:

Trie

import pm4py

prefix_tree = pm4py.discover_prefix_tree(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_temporal_profile(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[Tuple[str, str], Tuple[float, float]][source]#

Discovers a Temporal Profile from a log.

Implements the approach described in: Stertz, Florian, Jürgen Mangler, and Stefanie Rinderle-Ma. “Temporal Conformance Checking at Runtime based on Time-infused Process Models.” arXiv preprint arXiv:2008.07262 (2020).

The output is a dictionary containing, for every pair of activities that eventually follow each other in at least one case of the log, the average and the standard deviation of the time difference between their timestamps.

Example: If the log has two cases: - Case 1: A (timestamp: 1980-01) → B (timestamp: 1980-03) → C (timestamp: 1980-06) - Case 2: A (timestamp: 1990-01) → B (timestamp: 1990-02) → D (timestamp: 1990-03)

The returned dictionary will contain: ``` {

(‘A’, ‘B’): (1.5 months, 0.5 months), (‘A’, ‘C’): (5 months, 0), (‘A’, ‘D’): (2 months, 0)

}#

type case_id_key:

str

type timestamp_key:

str

type activity_key:

str

param log:

Event log or Pandas DataFrame.

param activity_key:

Attribute to be used for the activity (default: “concept:name”).

param timestamp_key:

Attribute to be used for the timestamp (default: “time:timestamp”).

param case_id_key:

Attribute to be used as case identifier (default: “case:concept:name”).

return:

A dictionary mapping each pair of activities to a tuple of (average time difference, standard deviation).

rtype:

Dict[Tuple[str, str], Tuple[float, float]]

import pm4py

temporal_profile = pm4py.discover_temporal_profile(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_log_skeleton(log: EventLog | DataFrame, noise_threshold: float = 0.0, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[str, Any][source]#

Discovers a Log Skeleton from an event log.

A Log Skeleton is a declarative model consisting of six different constraints: - directly_follows: Specifies strict bounds on activities that directly follow each other. Example: ‘A should be directly followed by B’ and ‘B should be directly followed by C’. - always_before: Specifies that some activities may only be executed if certain other activities have been executed earlier in the case. Example: ‘C should always be preceded by A’. - always_after: Specifies that certain activities should always trigger the execution of some other activities later in the case. Example: ‘A should always be followed by C’. - equivalence: Specifies that a given pair of activities should occur the same number of times within a case. Example: ‘B and C should always occur the same number of times’. - never_together: Specifies that a given pair of activities should never occur together in a case. Example: ‘There should be no case containing both C and D’. - activ_occurrences: Specifies allowed numbers of occurrences per activity. Example: ‘Activity A can occur 1 or 2 times, and Activity B can occur 1 to 4 times’.

Reference paper: Verbeek, H. M. W., and R. Medeiros de Carvalho. “Log skeletons: A classification approach to process discovery.” arXiv preprint arXiv:1806.08247 (2018).

Parameters:
  • log – Event log or Pandas DataFrame.

  • noise_threshold (float) – Noise threshold influencing the strictness of constraints (default: 0.0).

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A dictionary representing the Log Skeleton with various constraints.

Return type:

Dict[str, Any]

import pm4py

log_skeleton = pm4py.discover_log_skeleton(
    dataframe,
    noise_threshold=0.1,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.discovery.discover_declare(log: EventLog | DataFrame, allowed_templates: Set[str] | None = None, considered_activities: Set[str] | None = None, min_support_ratio: float | None = None, min_confidence_ratio: float | None = None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[str, Dict[Any, Dict[str, int]]][source]#

Discovers a DECLARE model from an event log.

Reference paper: F. M. Maggi, A. J. Mooij and W. M. P. van der Aalst, “User-guided discovery of declarative process models,” 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Paris, France, 2011, pp. 192-199, doi: 10.1109/CIDM.2011.5949297.

Parameters:
  • log – Event log or Pandas DataFrame.

  • allowed_templates – (Optional) Set of DECLARE templates to consider for discovery.

  • considered_activities – (Optional) Set of activities to consider for discovery.

  • min_support_ratio – (Optional) Minimum percentage of cases for which the discovered rules apply.

  • min_confidence_ratio – (Optional) Minimum percentage of cases for which the discovered rules are valid, based on the rule’s support.

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A dictionary representing the discovered DECLARE model with constraints and their parameters.

Return type:

Dict[str, Dict[Any, Dict[str, int]]]

import pm4py

declare_model = pm4py.discover_declare(log)
pm4py.discovery.discover_powl(log: EventLog | DataFrame, variant=None, filtering_weight_factor: float = 0.0, order_graph_filtering_threshold: float = None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') POWL[source]#

Discovers a POWL (Partially Ordered Workflow Language) model from an event log.

Reference paper: Kourani, Humam, and Sebastiaan J. van Zelst. “POWL: partially ordered workflow language.” International Conference on Business Process Management. Cham: Springer Nature Switzerland, 2023.

Parameters:
  • log – Event log or Pandas DataFrame.

  • variant – Variant of the POWL discovery algorithm to use.

  • filtering_weight_factor (float) – Factoring threshold for filtering weights, accepts values 0 <= x < 1 (default: 0.0).

  • order_graph_filtering_threshold (float) – Filtering threshold for the order graph, valid for the DYNAMIC_CLUSTERING variant, accepts values 0.5 < x <= 1 (default: None).

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A POWL object representing the discovered POWL model.

Return type:

POWL

import pm4py

log = pm4py.read_xes('tests/input_data/receipt.xes')
powl_model = pm4py.discover_powl(
    log,
    activity_key='concept:name'
)
print(powl_model)
pm4py.discovery.discover_batches(log: EventLog | DataFrame, merge_distance: int = 900, min_batch_size: int = 2, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', resource_key: str = 'org:resource') List[Tuple[Tuple[str, str], int, Dict[str, Any]]][source]#

Discovers batches from the provided log.

An activity is executed in batches by a given resource when the resource performs the same activity multiple times in a short period. Identifying such activities may highlight repetitive tasks that could be automated.

The following batch categories are detected: - Simultaneous: All events in the batch have identical start and end timestamps. - Batching at Start: All events in the batch have identical start timestamps. - Batching at End: All events in the batch have identical end timestamps. - Sequential Batching: Consecutive events have the end of the first equal to the start of the second. - Concurrent Batching: Consecutive events that do not match sequentially.

Reference paper: Martin, N., Swennen, M., Depaire, B., Jans, M., Caris, A., & Vanhoof, K. (2015, December). Batch Processing: Definition and Event Log Identification. In SIMPDA (pp. 137-140).

Parameters:
  • log – Event log or Pandas DataFrame.

  • merge_distance (int) – Maximum time distance (in seconds) between non-overlapping intervals to consider them part of the same batch (default: 900 seconds, i.e., 15 minutes).

  • min_batch_size (int) – Minimum number of events required to form a batch (default: 2).

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

  • resource_key (str) – Attribute to be used as resource (default: “org:resource”).

Returns:

A sorted list of tuples, each containing: - The (activity, resource) pair. - The number of batches for the given activity-resource. - A dictionary with batch details.

Return type:

List[Tuple[Tuple[str, str], int, Dict[str, Any]]]

import pm4py

batches = pm4py.discover_batches(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp',
    resource_key='org:resource'
)

pm4py.filtering module#

The pm4py.filtering module contains the filtering features offered in pm4py.

pm4py.filtering.filter_log_relative_occurrence_event_attribute(log: EventLog | DataFrame, min_relative_stake: float, attribute_key: str = 'concept:name', level: str = 'cases', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters the event log, keeping only the events that have an attribute value which occurs: - in at least the specified (min_relative_stake) percentage of events when level=”events”, - in at least the specified (min_relative_stake) percentage of cases when level=”cases”.

Parameters:
  • log – Event log or Pandas DataFrame.

  • min_relative_stake (float) – Minimum percentage of cases (expressed as a number between 0 and 1) in which the attribute should occur.

  • attribute_key (str) – The attribute to filter.

  • level (str) – The level of the filter (if level=”events”, then events; if level=”cases”, then cases).

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_log_relative_occurrence_event_attribute(
    dataframe,
    0.5,
    attribute_key='concept:name',
    level='cases',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.filtering.filter_start_activities(log: EventLog | DataFrame, activities: Set[str] | List[str], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters cases that have a start activity in the provided list.

Parameters:
  • log – Event log or Pandas DataFrame.

  • activities – Collection of start activities.

  • retain (bool) – If True, retains the traces containing the given start activities; if False, drops the traces.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_start_activities(
    dataframe,
    ['Act. A'],
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.filtering.filter_end_activities(log: EventLog | DataFrame, activities: Set[str] | List[str], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters cases that have an end activity in the provided list.

Parameters:
  • log – Event log or Pandas DataFrame.

  • activities – Collection of end activities.

  • retain (bool) – If True, retains the traces containing the given end activities; if False, drops the traces.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_end_activities(
    dataframe,
    ['Act. Z'],
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.filtering.filter_event_attribute_values(log: EventLog | DataFrame, attribute_key: str, values: Set[str] | List[str], level: str = 'case', retain: bool = True, case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters a log object based on the values of a specified event attribute.

Parameters:
  • log – Event log or Pandas DataFrame.

  • attribute_key (str) – Attribute to filter.

  • values – Admitted or forbidden values.

  • level (str) – Specifies how the filter should be applied (‘case’ filters the cases where at least one occurrence happens; ‘event’ filters the events, potentially trimming the cases).

  • retain (bool) – Specifies if the values should be kept or removed.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_event_attribute_values(
    dataframe,
    'concept:name',
    ['Act. A', 'Act. Z'],
    case_id_key='case:concept:name'
)
pm4py.filtering.filter_trace_attribute_values(log: EventLog | DataFrame, attribute_key: str, values: Set[str] | List[str], retain: bool = True, case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters a log based on the values of a specified trace attribute.

Parameters:
  • log – Event log or Pandas DataFrame.

  • attribute_key (str) – Attribute to filter.

  • values – Collection of values to filter.

  • retain (bool) – Boolean value indicating whether to keep or discard matching traces.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_trace_attribute_values(
    dataframe,
    'case:creator',
    ['Mike'],
    case_id_key='case:concept:name'
)
pm4py.filtering.filter_variants(log: EventLog | DataFrame, variants: Set[str] | List[str] | List[Tuple[str]], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters a log based on a specified set of variants.

Parameters:
  • log – Event log or Pandas DataFrame.

  • variants – Collection of variants to filter. A variant should be specified as a list of tuples of activity names, e.g., [(‘a’, ‘b’, ‘c’)].

  • retain (bool) – Boolean indicating whether to retain (if True) or remove (if False) traces conforming to the specified variants.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_variants(
    dataframe,
    [('Act. A', 'Act. B', 'Act. Z'), ('Act. A', 'Act. C', 'Act. Z')],
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.filtering.filter_directly_follows_relation(log: EventLog | DataFrame, relations: List[str], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Retains traces that contain any of the specified ‘directly follows’ relations. For example, if relations == [(‘a’,’b’),(‘a’,’c’)] and log [<a,b,c>,<a,c,b>,<a,d,b>], the resulting log will contain traces describing [<a,b,c>,<a,c,b>].

Parameters:
  • log – Event log or Pandas DataFrame.

  • relations – List of activity name pairs, representing allowed or forbidden paths.

  • retain (bool) – Boolean indicating whether the paths should be kept (if True) or removed (if False).

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_directly_follows_relation(
    dataframe,
    [('A', 'B'), ('A', 'C')],
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.filtering.filter_eventually_follows_relation(log: EventLog | DataFrame, relations: List[str], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Retains traces that contain any of the specified ‘eventually follows’ relations. For example, if relations == [(‘a’,’b’),(‘a’,’c’)] and log [<a,b,c>,<a,c,b>,<a,d,b>], the resulting log will contain traces describing [<a,b,c>,<a,c,b>,<a,d,b>].

Parameters:
  • log – Event log or Pandas DataFrame.

  • relations – List of activity name pairs, representing allowed or forbidden paths.

  • retain (bool) – Boolean indicating whether the paths should be kept (if True) or removed (if False).

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_eventually_follows_relation(
    dataframe,
    [('A', 'B'), ('A', 'C')],
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.filtering.filter_time_range(log: EventLog | DataFrame, dt1: str, dt2: str, mode: str = 'events', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters a log based on a time interval.

Parameters:
  • log – Event log or Pandas DataFrame.

  • dt1 (str) – Left extreme of the interval.

  • dt2 (str) – Right extreme of the interval.

  • mode (str) – Modality of filtering (‘events’, ‘traces_contained’, ‘traces_intersecting’). - ‘events’: Any event that fits the time frame is retained. - ‘traces_contained’: Any trace completely contained in the timeframe is retained. - ‘traces_intersecting’: Any trace intersecting with the timeframe is retained.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe1 = pm4py.filter_time_range(
    dataframe,
    '2010-01-01 00:00:00',
    '2011-01-01 00:00:00',
    mode='traces_contained',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
filtered_dataframe2 = pm4py.filter_time_range(
    dataframe,
    '2010-01-01 00:00:00',
    '2011-01-01 00:00:00',
    mode='traces_intersecting',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
filtered_dataframe3 = pm4py.filter_time_range(
    dataframe,
    '2010-01-01 00:00:00',
    '2011-01-01 00:00:00',
    mode='events',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.filtering.filter_between(log: EventLog | DataFrame, act1: str | List[str], act2: str | List[str], activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Finds all the sub-cases leading from an event with activity “act1” to an event with activity “act2” in the log, and returns a log containing only them.

Example:

Log A B C D E F A B E F C A B F C B C B E F C

act1 = B act2 = C

Returned sub-cases: B C (from the first case) B E F C (from the second case) B F C (from the third case) B C (from the third case) B E F C (from the third case)

Parameters:
  • log – Event log or Pandas DataFrame.

  • act1 – Source activity or collection of activities.

  • act2 – Target activity or collection of activities.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_between(
    dataframe,
    'A',
    'D',
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.filtering.filter_case_size(log: EventLog | DataFrame, min_size: int, max_size: int, case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters the event log, keeping cases that have a length (number of events) between min_size and max_size.

Parameters:
  • log – Event log or Pandas DataFrame.

  • min_size (int) – Minimum allowed number of events.

  • max_size (int) – Maximum allowed number of events.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_case_size(
    dataframe,
    5,
    10,
    case_id_key='case:concept:name'
)
pm4py.filtering.filter_case_performance(log: EventLog | DataFrame, min_performance: float, max_performance: float, timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters the event log, keeping cases that have a duration (the timestamp of the last event minus the timestamp of the first event) between min_performance and max_performance.

Parameters:
  • log – Event log or Pandas DataFrame.

  • min_performance (float) – Minimum allowed case duration.

  • max_performance (float) – Maximum allowed case duration.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_case_performance(
    dataframe,
    3600.0,
    86400.0,
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.filtering.filter_activities_rework(log: EventLog | DataFrame, activity: str, min_occurrences: int = 2, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters the event log, keeping cases where the specified activity occurs at least min_occurrences times.

Parameters:
  • log – Event log or Pandas DataFrame.

  • activity (str) – Activity to consider.

  • min_occurrences (int) – Minimum desired number of occurrences.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_activities_rework(
    dataframe,
    'Approve Order',
    2,
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.filtering.filter_paths_performance(log: EventLog | DataFrame, path: Tuple[str, str], min_performance: float, max_performance: float, keep: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters the event log based on the performance of specified paths.

  • If keep=True, retains cases having the specified path (tuple of 2 activities) with a duration between min_performance and max_performance.

  • If keep=False, discards cases having the specified path with a duration between min_performance and max_performance.

Parameters:
  • log – Event log or Pandas DataFrame.

  • path – Tuple of two activities (source_activity, target_activity).

  • min_performance (float) – Minimum allowed performance of the path.

  • max_performance (float) – Maximum allowed performance of the path.

  • keep (bool) – Boolean indicating whether to keep (if True) or discard (if False) the cases with the specified performance.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_paths_performance(
    dataframe,
    ('A', 'D'),
    3600.0,
    86400.0,
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.filtering.filter_variants_top_k(log: EventLog | DataFrame, k: int, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Keeps the top-k variants of the log.

Parameters:
  • log – Event log or Pandas DataFrame.

  • k (int) – Number of variants to keep.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_variants_top_k(
    dataframe,
    5,
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.filtering.filter_variants_by_coverage_percentage(log: EventLog | DataFrame, min_coverage_percentage: float, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters the variants of the log based on a coverage percentage. For example, if min_coverage_percentage=0.4 and the log has 1000 cases with: - 500 cases of variant 1, - 400 cases of variant 2, - 100 cases of variant 3, the filter keeps only the traces of variant 1 and variant 2.

Parameters:
  • log – Event log or Pandas DataFrame.

  • min_coverage_percentage (float) – Minimum allowed percentage of coverage.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_variants_by_coverage_percentage(
    dataframe,
    0.1,
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.filtering.filter_prefixes(log: EventLog | DataFrame, activity: str, strict: bool = True, first_or_last: str = 'first', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters the log, keeping the prefixes leading up to a given activity. For example, for a log with traces: - A,B,C,D - A,B,Z,A,B,C,D - A,B,C,D,C,E,C,F

The prefixes to “C” are respectively: - A,B - A,B,Z,A,B - A,B

Parameters:
  • log – Event log or Pandas DataFrame.

  • activity (str) – Target activity for the filter.

  • strict (bool) – Applies the filter strictly, cutting the occurrences of the selected activity.

  • first_or_last (str) – Decides if the first or last occurrence of an activity should be selected as the baseline for the filter.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_prefixes(
    dataframe,
    'Act. C',
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.filtering.filter_suffixes(log: EventLog | DataFrame, activity: str, strict: bool = True, first_or_last: str = 'first', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters the log, keeping the suffixes starting from a given activity. For example, for a log with traces: - A,B,C,D - A,B,Z,A,B,C,D - A,B,C,D,C,E,C,F

The suffixes from “C” are respectively: - D - D - D,C,E,C,F

Parameters:
  • log – Event log or Pandas DataFrame.

  • activity (str) – Target activity for the filter.

  • strict (bool) – Applies the filter strictly, cutting the occurrences of the selected activity.

  • first_or_last (str) – Decides if the first or last occurrence of an activity should be selected as the baseline for the filter.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_suffixes(
    dataframe,
    'Act. C',
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.filtering.filter_ocel_event_attribute(ocel: OCEL, attribute_key: str, attribute_values: Collection[Any], positive: bool = True) OCEL[source]#

Filters the object-centric event log based on the provided event attribute values.

Return type:

OCEL

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • attribute_key (str) – Attribute at the event level to filter.

  • attribute_values – Collection of attribute values to keep or remove.

  • positive (bool) – Determines whether the values should be kept (True) or removed (False).

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_event_attribute(
    ocel,
    'ocel:activity',
    ['A', 'B', 'D']
)
pm4py.filtering.filter_ocel_object_attribute(ocel: OCEL, attribute_key: str, attribute_values: Collection[Any], positive: bool = True) OCEL[source]#

Filters the object-centric event log based on the provided object attribute values.

Return type:

OCEL

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • attribute_key (str) – Attribute at the object level to filter.

  • attribute_values – Collection of attribute values to keep or remove.

  • positive (bool) – Determines whether the values should be kept (True) or removed (False).

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_object_attribute(
    ocel,
    'ocel:type',
    ['order']
)
pm4py.filtering.filter_ocel_object_types_allowed_activities(ocel: OCEL, correspondence_dict: Dict[str, Collection[str]]) OCEL[source]#

Filters an object-centric event log, keeping only the specified object types with the specified set of allowed activities.

Return type:

OCEL

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • correspondence_dict – Dictionary containing, for every object type of interest, a collection of allowed activities. Example: {“order”: [“Create Order”], “element”: [“Create Order”, “Create Delivery”]}.

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_object_types_allowed_activities(
    ocel,
    {'order': ['create order', 'pay order'], 'item': ['create item', 'deliver item']}
)
pm4py.filtering.filter_ocel_object_per_type_count(ocel: OCEL, min_num_obj_type: Dict[str, int]) OCEL[source]#

Filters the events of the object-centric logs that are related to at least the specified number of objects per type.

Example: pm4py.filter_object_per_type_count(ocel, {“order”: 1, “element”: 2})

Would keep the following events:

ocel:eid ocel:timestamp ocel:activity ocel:type:element ocel:type:order

0 e1 1980-01-01 Create Order [i4, i1, i3, i2] [o1] 1 e11 1981-01-01 Create Order [i6, i5] [o2] 2 e14 1981-01-04 Create Order [i8, i7] [o3]

Return type:

OCEL

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • min_num_obj_type – Minimum number of objects per type.

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_object_per_type_count(
    ocel,
    {'order': 1, 'element': 2}
)
pm4py.filtering.filter_ocel_start_events_per_object_type(ocel: OCEL, object_type: str) OCEL[source]#

Filters the events in which a new object of the given object type is spawned. For example, an event with activity “Create Order” might spawn new orders.

Return type:

OCEL

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • object_type (str) – Object type to consider.

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_start_events_per_object_type(
    ocel,
    'delivery'
)
pm4py.filtering.filter_ocel_end_events_per_object_type(ocel: OCEL, object_type: str) OCEL[source]#

Filters the events in which an object of the given object type terminates its lifecycle. For example, an event with activity “Pay Order” might terminate an order.

Return type:

OCEL

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • object_type (str) – Object type to consider.

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_end_events_per_object_type(
    ocel,
    'delivery'
)
pm4py.filtering.filter_ocel_events_timestamp(ocel: OCEL, min_timest: datetime | str, max_timest: datetime | str, timestamp_key: str = 'ocel:timestamp') OCEL[source]#

Filters the object-centric event log, keeping events within the provided timestamp range.

Return type:

OCEL

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • min_timest – Left extreme of the allowed timestamp interval (format: YYYY-mm-dd HH:MM:SS).

  • max_timest – Right extreme of the allowed timestamp interval (format: YYYY-mm-dd HH:MM:SS).

  • timestamp_key (str) – The attribute to use as timestamp (default: ocel:timestamp).

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_events_timestamp(
    ocel,
    '1990-01-01 00:00:00',
    '2010-01-01 00:00:00'
)
pm4py.filtering.filter_four_eyes_principle(log: EventLog | DataFrame, activity1: str, activity2: str, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', resource_key: str = 'org:resource', keep_violations: bool = False) EventLog | DataFrame[source]#

Filters out the cases of the log that violate the four-eyes principle on the provided activities.

Parameters:
  • log – Event log or Pandas DataFrame.

  • activity1 (str) – First activity.

  • activity2 (str) – Second activity.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

  • resource_key (str) – Attribute to be used as resource.

  • keep_violations (bool) – Boolean indicating whether to discard (if False) or retain (if True) the violations.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_four_eyes_principle(
    dataframe,
    'Act. A',
    'Act. B',
    activity_key='concept:name',
    resource_key='org:resource',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.filtering.filter_activity_done_different_resources(log: EventLog | DataFrame, activity: str, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', resource_key: str = 'org:resource', keep_violations: bool = True) EventLog | DataFrame[source]#

Filters the cases where an activity is performed by different resources multiple times.

Parameters:
  • log – Event log or Pandas DataFrame.

  • activity (str) – Activity to consider.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

  • resource_key (str) – Attribute to be used as resource.

  • keep_violations (bool) – Boolean indicating whether to discard (if False) or retain (if True) the violations.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_activity_done_different_resources(
    dataframe,
    'Act. A',
    activity_key='concept:name',
    resource_key='org:resource',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.filtering.filter_trace_segments(log: EventLog | DataFrame, admitted_traces: List[List[str]], positive: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Filters an event log based on a set of trace segments. A trace is a sequence of activities and “…” where: - “…” before an activity indicates that other activities can precede the given activity. - “…” after an activity indicates that other activities can follow the given activity.

Examples: - pm4py.filter_trace_segments(log, [[“A”, “B”]]) retains only cases with the exact process variant A,B. - pm4py.filter_trace_segments(log, [[”…”, “A”, “B”]]) retains only cases ending with activities A,B. - pm4py.filter_trace_segments(log, [[“A”, “B”, “…”]]) retains only cases starting with activities A,B. - pm4py.filter_trace_segments(log, [[”…”, “A”, “B”, “C”, “…”], [”…”, “D”, “E”, “F”, “…”]]) retains cases where:

  • At any point, there is A followed by B followed by C,

  • And at any other point, there is D followed by E followed by F.

Parameters:
  • log – Event log or Pandas DataFrame.

  • admitted_traces – Collection of trace segments to admit based on the criteria above.

  • positive (bool) – Boolean indicating whether to keep (if True) or discard (if False) the cases satisfying the filter.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

log = pm4py.read_xes("tests/input_data/running-example.xes")

filtered_log = pm4py.filter_trace_segments(
    log,
    [["...", "check ticket", "decide", "reinitiate request", "..."]]
)
print(filtered_log)
pm4py.filtering.filter_ocel_object_types(ocel: OCEL, obj_types: Collection[str], positive: bool = True, level: int = 1) OCEL[source]#

Filters the object types of an object-centric event log.

Return type:

OCEL

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • obj_types – Object types to keep or remove.

  • positive (bool) – Boolean indicating whether to keep (True) or remove (False) the specified object types.

  • level (int) – Recursively expands the set of object identifiers until the specified level.

Returns:

Filtered OCEL.

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_object_types(
    ocel,
    ['order']
)
pm4py.filtering.filter_ocel_objects(ocel: OCEL, object_identifiers: Collection[str], positive: bool = True, level: int = 1) OCEL[source]#

Filters the object identifiers of an object-centric event log.

Return type:

OCEL

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • object_identifiers – Object identifiers to keep or remove.

  • positive (bool) – Boolean indicating whether to keep (True) or remove (False) the specified object identifiers.

  • level (int) – Recursively expands the set of object identifiers until the specified level.

Returns:

Filtered OCEL.

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_objects(
    ocel,
    ['o1'],
    level=1
)
pm4py.filtering.filter_ocel_events(ocel: OCEL, event_identifiers: Collection[str], positive: bool = True) OCEL[source]#

Filters the event identifiers of an object-centric event log.

Return type:

OCEL

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • event_identifiers – Event identifiers to keep or remove.

  • positive (bool) – Boolean indicating whether to keep (True) or remove (False) the specified event identifiers.

Returns:

Filtered OCEL.

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_events(
    ocel,
    ['e1']
)
pm4py.filtering.filter_ocel_activities_connected_object_type(ocel: OCEL, object_type: str) OCEL[source]#

Filter an OCEL on the set of activities executed on objects of the given object type.

Parameters:
  • ocel (OCEL) – object-centric event log

  • object_type (str) – object type

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel2("tests/input_data/ocel/ocel20_example.xmlocel")
filtered_ocel = pm4py.filter_ocel_activities_connected_object_type(ocel, "Purchase Order")
print(filtered_ocel)
pm4py.filtering.filter_ocel_cc_object(ocel: OCEL, object_id: str, conn_comp: List[List[str]] | None = None, return_conn_comp: bool = False) OCEL | Tuple[OCEL, List[List[str]]][source]#

Returns the connected component of the object-centric event log to which the specified object belongs.

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • object_id (str) – Object identifier.

  • conn_comp – (Optional) Precomputed connected components of the OCEL objects.

  • return_conn_comp (bool) – If True, returns the filtered OCEL along with the computed connected components.

Returns:

Filtered OCEL, optionally with the list of connected components.

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_cc_object(
    ocel,
    'order1'
)
pm4py.filtering.filter_ocel_cc_length(ocel: OCEL, min_cc_length: int, max_cc_length: int) OCEL[source]#

Keeps only the objects in an OCEL belonging to a connected component with a length falling within the specified range.

Reference: Adams, Jan Niklas, et al. “Defining cases and variants for object-centric event data.” 2022 4th International Conference on Process Mining (ICPM). IEEE, 2022.

Return type:

OCEL

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • min_cc_length (int) – Minimum allowed length for the connected component.

  • max_cc_length (int) – Maximum allowed length for the connected component.

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_cc_length(
    ocel,
    2,
    10
)
pm4py.filtering.filter_ocel_cc_otype(ocel: OCEL, otype: str, positive: bool = True) OCEL[source]#

Filters the objects belonging to connected components that have at least one object of the specified type.

Reference: Adams, Jan Niklas, et al. “Defining cases and variants for object-centric event data.” 2022 4th International Conference on Process Mining (ICPM). IEEE, 2022.

Return type:

OCEL

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • otype (str) – Object type to consider.

  • positive (bool) – Boolean indicating whether to keep (True) or discard (False) the objects in these components.

Returns:

Filtered OCEL.

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_cc_otype(
    ocel,
    'order'
)
pm4py.filtering.filter_ocel_cc_activity(ocel: OCEL, activity: str) OCEL[source]#

Filters the objects belonging to connected components that include at least one event with the specified activity.

Reference: Adams, Jan Niklas, et al. “Defining cases and variants for object-centric event data.” 2022 4th International Conference on Process Mining (ICPM). IEEE, 2022.

Return type:

OCEL

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • activity (str) – Activity to consider.

Returns:

Filtered OCEL.

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_cc_activity(
    ocel,
    'Create Order'
)

pm4py.hof module#

pm4py.hof.filter_log(f: Callable[[Any], bool], log: EventLog) EventLog | EventStream[source]#

Filters the log according to a given (lambda) function.

Parameters:
  • f – function that specifies the filter criterion, may be a lambda

  • log (EventLog) – event log; either EventLog or EventStream Object

Return type:

Union[log_inst.EventLog, log_inst.EventStream]

Deprecated since version 2.3.0: This will be removed in 3.0.0. the EventLog class will be removed in a future release.

pm4py.hof.filter_trace(f: Callable[[Any], bool], trace: Trace) Trace[source]#

Filters the trace according to a given (lambda) function.

Parameters:
  • f – function that specifies the filter criterion, may be a lambda

  • trace (Trace) – trace; PM4Py trace object

Return type:

log_inst.Trace

pm4py.hof.sort_log(log: EventLog, key, reverse: bool = False) EventLog | EventStream[source]#

Sorts the event log according to a given key.

Parameters:
  • log (EventLog) – event log object; either EventLog or EventStream

  • key – sorting key

  • reverse (bool) – indicates whether sorting should be reversed or not

Return type:

Union[log_inst.EventLog, log_inst.EventStream]

Deprecated since version 2.3.0: This will be removed in 3.0.0. the EventLog class will be removed in a future release.

pm4py.hof.sort_trace(trace: Trace, key, reverse: bool = False) Trace[source]#

Sorts the events in a trace according to a given key.

Parameters:
  • trace (Trace) – input trace

  • key – sorting key

  • reverse (bool) – indicates whether sorting should be reversed (default False)

Return type:

log_inst.Trace

Deprecated since version 2.3.0: This will be removed in 3.0.0. the EventLog class will be removed in a future release.

pm4py.llm module#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

pm4py.llm.openai_query(prompt: str, api_key: str | None = None, openai_model: str | None = None, api_url: str | None = None, **kwargs) str[source]#

Executes the provided prompt, obtaining the answer from the OpenAI APIs.

Return type:

str

Parameters:
  • prompt (str) – The prompt to be executed.

  • api_key – (Optional) OpenAI API key.

  • openai_model – (Optional) OpenAI model to be used (default: “gpt-3.5-turbo”).

  • api_url – (Optional) OpenAI API URL.

  • **kwargs

    Additional parameters to pass to the OpenAI API.

Returns:

The response from the OpenAI API as a string.

import pm4py

resp = pm4py.llm.openai_query('What is the result of 3+3?', api_key="sk-382393", openai_model="gpt-3.5-turbo")
print(resp)
pm4py.llm.abstract_dfg(log_obj: DataFrame | EventLog | EventStream, max_len: int = 10000, include_performance: bool = True, relative_frequency: bool = False, response_header: bool = True, primary_performance_aggregation: str = 'mean', secondary_performance_aggregation: str | None = None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') str[source]#

Obtains the DFG (Directly-Follows Graph) abstraction of a traditional event log.

Return type:

str

Parameters:
  • log_obj – The log object to abstract.

  • max_len (int) – Maximum length of the string abstraction (default: constants.OPENAI_MAX_LEN).

  • include_performance (bool) – Whether to include the performance of the paths in the abstraction.

  • relative_frequency (bool) – Whether to use relative instead of absolute frequency of the paths.

  • response_header (bool) – Whether to include a short header before the paths, describing the abstraction.

  • primary_performance_aggregation (str) – Primary aggregation method for the arc’s performance (default: “mean”). Other options: “median”, “min”, “max”, “sum”, “stdev”.

  • secondary_performance_aggregation – (Optional) Secondary aggregation method for the arc’s performance (default: None). Other options: “mean”, “median”, “min”, “max”, “sum”, “stdev”.

  • activity_key (str) – The column name to be used as activity.

  • timestamp_key (str) – The column name to be used as timestamp.

  • case_id_key (str) – The column name to be used as case identifier.

Returns:

The DFG abstraction as a string.

import pm4py

log = pm4py.read_xes("tests/input_data/roadtraffic100traces.xes")
print(pm4py.llm.abstract_dfg(log))
pm4py.llm.abstract_variants(log_obj: DataFrame | EventLog | EventStream, max_len: int = 10000, include_performance: bool = True, relative_frequency: bool = False, response_header: bool = True, primary_performance_aggregation: str = 'mean', secondary_performance_aggregation: str | None = None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') str[source]#

Obtains the variants abstraction of a traditional event log.

Return type:

str

Parameters:
  • log_obj – The log object to abstract.

  • max_len (int) – Maximum length of the string abstraction (default: constants.OPENAI_MAX_LEN).

  • include_performance (bool) – Whether to include the performance of the variants in the abstraction.

  • relative_frequency (bool) – Whether to use relative instead of absolute frequency of the variants.

  • response_header (bool) – Whether to include a short header before the variants, describing the abstraction.

  • primary_performance_aggregation (str) – Primary aggregation method for the variants’ performance (default: “mean”). Other options: “median”, “min”, “max”, “sum”, “stdev”.

  • secondary_performance_aggregation – (Optional) Secondary aggregation method for the variants’ performance (default: None). Other options: “mean”, “median”, “min”, “max”, “sum”, “stdev”.

  • activity_key (str) – The column name to be used as activity.

  • timestamp_key (str) – The column name to be used as timestamp.

  • case_id_key (str) – The column name to be used as case identifier.

Returns:

The variants abstraction as a string.

import pm4py

log = pm4py.read_xes("tests/input_data/roadtraffic100traces.xes")
print(pm4py.llm.abstract_variants(log))
pm4py.llm.abstract_ocel(ocel: OCEL, include_timestamps: bool = True) str[source]#

Obtains the abstraction of an object-centric event log, including the list of events and the objects of the OCEL.

Return type:

str

Parameters:
  • ocel (OCEL) – The object-centric event log to abstract.

  • include_timestamps (bool) – Whether to include timestamp information in the abstraction.

Returns:

The OCEL abstraction as a string.

import pm4py

ocel = pm4py.read_ocel("tests/input_data/ocel/example_log.jsonocel")
print(pm4py.llm.abstract_ocel(ocel))
pm4py.llm.abstract_ocel_ocdfg(ocel: OCEL, include_header: bool = True, include_timestamps: bool = True, max_len: int = 10000) str[source]#

Obtains the abstraction of an object-centric event log, representing the object-centric directly-follows graph in text.

Return type:

str

Parameters:
  • ocel (OCEL) – The object-centric event log to abstract.

  • include_header (bool) – Whether to include a header in the abstraction.

  • include_timestamps (bool) – Whether to include timestamp information in the abstraction.

  • max_len (int) – Maximum length of the abstraction (default: constants.OPENAI_MAX_LEN).

Returns:

The object-centric DFG abstraction as a string.

import pm4py

ocel = pm4py.read_ocel("tests/input_data/ocel/example_log.jsonocel")
print(pm4py.llm.abstract_ocel_ocdfg(ocel))
pm4py.llm.abstract_ocel_features(ocel: OCEL, obj_type: str, include_header: bool = True, max_len: int = 10000, debug: bool = False, enable_object_lifecycle_paths: bool = True) str[source]#

Obtains the abstraction of an object-centric event log, representing the features and their values in text.

Return type:

str

Parameters:
  • ocel (OCEL) – The object-centric event log to abstract.

  • obj_type (str) – The object type to consider in feature extraction.

  • include_header (bool) – Whether to include a header in the abstraction.

  • max_len (int) – Maximum length of the abstraction (default: constants.OPENAI_MAX_LEN).

  • debug (bool) – Enables debugging mode, providing insights into feature extraction steps.

  • enable_object_lifecycle_paths (bool) – Enables the “lifecycle paths” feature in the abstraction.

Returns:

The OCEL features abstraction as a string.

import pm4py

ocel = pm4py.read_ocel("tests/input_data/ocel/example_log.jsonocel")
print(pm4py.llm.abstract_ocel_features(ocel, obj_type="Resource"))
pm4py.llm.abstract_event_stream(log_obj: DataFrame | EventLog | EventStream, max_len: int = 10000, response_header: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') str[source]#

Obtains the event stream abstraction of a traditional event log.

Return type:

str

Parameters:
  • log_obj – The log object to abstract.

  • max_len (int) – Maximum length of the string abstraction (default: constants.OPENAI_MAX_LEN).

  • response_header (bool) – Whether to include a short header before the event stream, describing the abstraction.

  • activity_key (str) – The column name to be used as activity.

  • timestamp_key (str) – The column name to be used as timestamp.

  • case_id_key (str) – The column name to be used as case identifier.

Returns:

The event stream abstraction as a string.

import pm4py

log = pm4py.read_xes("tests/input_data/roadtraffic100traces.xes")
print(pm4py.llm.abstract_event_stream(log))
pm4py.llm.abstract_petri_net(net: PetriNet, im: Marking, fm: Marking, response_header: bool = True) str[source]#

Obtains an abstraction of a Petri net.

Return type:

str

Parameters:
  • net (PetriNet) – The Petri net to abstract.

  • im (Marking) – The initial marking of the Petri net.

  • fm (Marking) – The final marking of the Petri net.

  • response_header (bool) – Whether to include a header in the abstraction.

Returns:

The Petri net abstraction as a string.

import pm4py

net, im, fm = pm4py.read_pnml('tests/input_data/running-example.pnml')
print(pm4py.llm.abstract_petri_net(net, im, fm))
pm4py.llm.abstract_log_attributes(log_obj: DataFrame | EventLog | EventStream, max_len: int = 10000, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') str[source]#

Abstracts the attributes of a log by reporting their names, types, and top values.

Return type:

str

Parameters:
  • log_obj – The log object whose attributes are to be abstracted.

  • max_len (int) – Maximum length of the string abstraction (default: constants.OPENAI_MAX_LEN).

  • activity_key (str) – The column name to be used as activity.

  • timestamp_key (str) – The column name to be used as timestamp.

  • case_id_key (str) – The column name to be used as case identifier.

Returns:

The log attributes abstraction as a string.

import pm4py

log = pm4py.read_xes("tests/input_data/roadtraffic100traces.xes")
print(pm4py.llm.abstract_log_attributes(log))
pm4py.llm.abstract_log_features(log_obj: DataFrame | EventLog | EventStream, max_len: int = 10000, include_header: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') str[source]#

Abstracts the machine learning features obtained from a log by reporting the top features until the desired length is achieved.

Return type:

str

Parameters:
  • log_obj – The log object from which to extract features.

  • max_len (int) – Maximum length of the string abstraction (default: constants.OPENAI_MAX_LEN).

  • include_header (bool) – Whether to include a header in the abstraction.

  • activity_key (str) – The column name to be used as activity.

  • timestamp_key (str) – The column name to be used as timestamp.

  • case_id_key (str) – The column name to be used as case identifier.

Returns:

The log features abstraction as a string.

import pm4py

log = pm4py.read_xes("tests/input_data/roadtraffic100traces.xes")
print(pm4py.llm.abstract_log_features(log))
pm4py.llm.abstract_temporal_profile(temporal_profile: Dict[Tuple[str, str], Tuple[float, float]], include_header: bool = True) str[source]#

Abstracts a temporal profile model into a descriptive string.

Return type:

str

Parameters:
  • temporal_profile – The temporal profile model to abstract.

  • include_header (bool) – Whether to include a header in the abstraction describing the temporal profile.

Returns:

The temporal profile abstraction as a string.

import pm4py

log = pm4py.read_xes("tests/input_data/roadtraffic100traces.xes", return_legacy_log_object=True)
temporal_profile = pm4py.discover_temporal_profile(log)
text_abstr = pm4py.llm.abstract_temporal_profile(temporal_profile, include_header=True)
print(text_abstr)
pm4py.llm.abstract_case(case: Trace, include_case_attributes: bool = True, include_event_attributes: bool = True, include_timestamp: bool = True, include_header: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp') str[source]#

Textually abstracts a single case from an event log.

Return type:

str

Parameters:
  • case (Trace) – The case object to abstract.

  • include_case_attributes (bool) – Whether to include attributes at the case level.

  • include_event_attributes (bool) – Whether to include attributes at the event level.

  • include_timestamp (bool) – Whether to include event timestamps in the abstraction.

  • include_header (bool) – Whether to include a header in the abstraction.

  • activity_key (str) – The column name to be used as activity.

  • timestamp_key (str) – The column name to be used as timestamp.

Returns:

The case abstraction as a string.

import pm4py

log = pm4py.read_xes("tests/input_data/roadtraffic100traces.xes", return_legacy_log_object=True)
print(pm4py.llm.abstract_case(log[0]))
pm4py.llm.abstract_declare(declare_model, include_header: bool = True) str[source]#

Textually abstracts a DECLARE model.

Return type:

str

Parameters:
  • declare_model – The DECLARE model to abstract.

  • include_header (bool) – Whether to include a header in the abstraction.

Returns:

The DECLARE model abstraction as a string.

import pm4py

log = pm4py.read_xes("tests/input_data/roadtraffic100traces.xes", return_legacy_log_object=True)
log_ske = pm4py.discover_declare(log)
print(pm4py.llm.abstract_declare(log_ske))
pm4py.llm.abstract_log_skeleton(log_skeleton, include_header: bool = True) str[source]#

Textually abstracts a log skeleton process model.

Return type:

str

Parameters:
  • log_skeleton – The log skeleton to abstract.

  • include_header (bool) – Whether to include a header in the abstraction.

Returns:

The log skeleton abstraction as a string.

import pm4py

log = pm4py.read_xes("tests/input_data/roadtraffic100traces.xes", return_legacy_log_object=True)
log_ske = pm4py.discover_log_skeleton(log)
print(pm4py.llm.abstract_log_skeleton(log_ske))
pm4py.llm.explain_visualization(vis_saver, *args, connector=<function openai_query>, **kwargs) str[source]#

Explains a process mining visualization using LLMs by saving it as a .png image and providing the image to the Large Language Model along with a description.

Return type:

str

Parameters:
  • vis_saver – The visualizer function used to save the visualization to disk.

  • args – Positional arguments required by the visualizer function.

  • connector – (Optional) The connector method to communicate with the large language model (default: openai_query).

  • **kwargs

    Additional keyword arguments for the visualizer function or the connector (e.g., annotations, API key).

Returns:

The explanation of the visualization as a string.

import pm4py

log = pm4py.read_xes("tests/input_data/running-example.xes")
descr = pm4py.llm.explain_visualization(pm4py.save_vis_dotted_chart, log, api_key="sk-5HN", show_legend=False)
print(descr)

pm4py.meta module#

Process mining for Python

pm4py.ml module#

The pm4py.ml module contains the machine learning features offered in pm4py.

pm4py.ml.split_train_test(log: EventLog | DataFrame, train_percentage: float = 0.8, case_id_key: str = 'case:concept:name') Tuple[EventLog, EventLog] | Tuple[DataFrame, DataFrame][source]#

Splits an event log into a training log and a test log for machine learning purposes.

This function separates the provided log into two parts based on the specified training percentage. It ensures that entire cases are included in either the training set or the test set.

Parameters:
  • log – The event log or Pandas DataFrame to be split.

  • train_percentage (float) – Fraction of cases to be included in the training log (between 0.0 and 1.0).

  • case_id_key (str) – Attribute to be used as the case identifier.

Returns:

A tuple containing the training and test event logs or DataFrames.

Return type:

Union[Tuple[EventLog, EventLog], Tuple[pd.DataFrame, pd.DataFrame]]

import pm4py

train_df, test_df = pm4py.split_train_test(dataframe, train_percentage=0.75)
pm4py.ml.get_prefixes_from_log(log: EventLog | DataFrame, length: int, case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#

Retrieves prefixes of traces in a log up to a specified length.

The returned log contains prefixes of each trace: - If a trace has a length less than or equal to the specified length, it is included as-is. - If a trace exceeds the specified length, it is truncated to that length.

Parameters:
  • log – The event log or Pandas DataFrame from which to extract prefixes.

  • length (int) – The maximum length of prefixes to extract.

  • case_id_key (str) – Attribute to be used as the case identifier.

Returns:

A log containing the prefixes of the original log.

Return type:

Union[EventLog, pd.DataFrame]

import pm4py

trimmed_df = pm4py.get_prefixes_from_log(dataframe, length=5, case_id_key='case:concept:name')
pm4py.ml.extract_outcome_enriched_dataframe(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', start_timestamp_key: str = 'time:timestamp') DataFrame[source]#

Enriches a dataframe with additional outcome-related columns computed from the entire case.

This function adds columns that model the outcome of each case by computing metrics such as arrival rates and service waiting times.

Parameters:
  • log – The event log or Pandas DataFrame to be enriched.

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as the case identifier.

  • start_timestamp_key (str) – Attribute to be used as the start timestamp.

Returns:

An enriched Pandas DataFrame with additional outcome-related columns.

Return type:

pd.DataFrame

import pm4py

enriched_df = pm4py.extract_outcome_enriched_dataframe(
    log,
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name',
    start_timestamp_key='time:timestamp'
)
pm4py.ml.extract_features_dataframe(log: EventLog | DataFrame, str_tr_attr: List[str] | None = None, num_tr_attr: List[str] | None = None, str_ev_attr: List[str] | None = None, num_ev_attr: List[str] | None = None, str_evsucc_attr: List[str] | None = None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str | None = None, resource_key: str = 'org:resource', include_case_id: bool = False, **kwargs) DataFrame[source]#

Extracts a dataframe containing features for each case in the provided log object.

This function processes the log to generate a set of features that can be used for machine learning tasks. Features can include both case-level and event-level attributes, with options for one-hot encoding.

Parameters:
  • log – The event log or Pandas DataFrame from which to extract features.

  • str_tr_attr – (Optional) List of string attributes at the case level to extract as features.

  • num_tr_attr – (Optional) List of numeric attributes at the case level to extract as features.

  • str_ev_attr – (Optional) List of string attributes at the event level to extract as features (one-hot encoded).

  • num_ev_attr – (Optional) List of numeric attributes at the event level to extract as features (uses the last value per attribute in a case).

  • str_evsucc_attr – (Optional) List of string successor attributes at the event level to extract as features.

  • activity_key (str) – Attribute to be used as the activity identifier.

  • timestamp_key (str) – Attribute to be used for timestamps.

  • case_id_key – (Optional) Attribute to be used as the case identifier. If not provided, the default is used.

  • resource_key (str) – Attribute to be used as the resource identifier.

  • include_case_id (bool) – Whether to include the case identifier column in the features table.

  • **kwargs

    Additional keyword arguments to pass to the feature extraction algorithm.

Returns:

A Pandas DataFrame containing the extracted features for each case.

Return type:

pd.DataFrame

import pm4py

features_df = pm4py.extract_features_dataframe(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.ml.extract_ocel_features(ocel: OCEL, obj_type: str, enable_object_lifecycle_paths: bool = True, enable_object_work_in_progress: bool = False, object_str_attributes: Collection[str] | None = None, object_num_attributes: Collection[str] | None = None, include_obj_id: bool = False, debug: bool = False) DataFrame[source]#

Extracts a set of features from an object-centric event log (OCEL) for objects of a specified type.

This function computes various features based on the lifecycle paths and work-in-progress metrics of objects within the OCEL. It also supports encoding of string and numeric object attributes.

The approach is based on: Berti, A., Herforth, J., Qafari, M.S. et al. Graph-based feature extraction on object-centric event logs. Int J Data Sci Anal (2023). https://doi.org/10.1007/s41060-023-00428-2

Parameters:
  • ocel (OCEL) – The object-centric event log from which to extract features.

  • obj_type (str) – The object type to consider for feature extraction.

  • enable_object_lifecycle_paths (bool) – Whether to enable the “lifecycle paths” feature.

  • enable_object_work_in_progress (bool) – Whether to enable the “work in progress” feature, which has a high computational cost.

  • object_str_attributes – (Optional) Collection of string attributes at the object level to one-hot encode.

  • object_num_attributes – (Optional) Collection of numeric attributes at the object level to encode.

  • include_obj_id (bool) – Whether to include the object identifier as a column in the features DataFrame.

  • debug (bool) – Whether to enable debugging mode to track the feature extraction process.

Returns:

A Pandas DataFrame containing the extracted features for the specified object type.

Return type:

pd.DataFrame

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
fea_df = pm4py.extract_ocel_features(ocel, "item")
pm4py.ml.extract_temporal_features_dataframe(log: EventLog | DataFrame, grouper_freq: str = 'W', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str | None = None, start_timestamp_key: str = 'time:timestamp', resource_key: str = 'org:resource') DataFrame[source]#

Extracts temporal features from a log object and returns them as a dataframe.

This function computes temporal metrics based on the specified grouping frequency, which can be daily (D), weekly (W), monthly (M), or yearly (Y). These features are useful for analyzing system dynamics and simulation in the context of process mining.

The approach is based on: Pourbafrani, Mahsa, Sebastiaan J. van Zelst, and Wil MP van der Aalst. “Supporting automatic system dynamics model generation for simulation in the context of process mining.” International Conference on Business Information Systems. Springer, Cham, 2020.

Parameters:
  • log – The event log or Pandas DataFrame from which to extract temporal features.

  • grouper_freq (str) – The frequency to use for grouping (e.g., ‘D’ for daily, ‘W’ for weekly, ‘M’ for monthly, ‘Y’ for yearly).

  • activity_key (str) – Attribute to be used as the activity identifier.

  • timestamp_key (str) – Attribute to be used for timestamps.

  • case_id_key – (Optional) Attribute to be used as the case identifier. If not provided, the default is used.

  • start_timestamp_key (str) – Attribute to be used as the start timestamp.

  • resource_key (str) – Attribute to be used as the resource identifier.

Returns:

A Pandas DataFrame containing the extracted temporal features.

Return type:

pd.DataFrame

import pm4py

temporal_features_df = pm4py.extract_temporal_features_dataframe(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
pm4py.ml.extract_target_vector(log: EventLog | DataFrame, variant: str, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Tuple[Any, List[str]][source]#

Extracts the target vector from a log object for a specific machine learning use case.

Supported variants include: - ‘next_activity’: Predicts the next activity in a case. - ‘next_time’: Predicts the timestamp of the next activity. - ‘remaining_time’: Predicts the remaining time for the case.

Parameters:
  • log – The event log or Pandas DataFrame from which to extract the target vector.

  • variant (str) – The variant of the algorithm to use. Must be one of: ‘next_activity’, ‘next_time’, ‘remaining_time’.

  • activity_key (str) – Attribute to be used as the activity identifier.

  • timestamp_key (str) – Attribute to be used for timestamps.

  • case_id_key (str) – Attribute to be used as the case identifier.

Returns:

A tuple containing the target vector and a list of class labels (if applicable).

Return type:

Tuple[Any, List[str]]

Raises:

Exception – If an unsupported variant is provided.

import pm4py

vector_next_act, class_next_act = pm4py.extract_target_vector(
    log,
    'next_activity',
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
vector_next_time, class_next_time = pm4py.extract_target_vector(
    log,
    'next_time',
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
vector_rem_time, class_rem_time = pm4py.extract_target_vector(
    log,
    'remaining_time',
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)

pm4py.ocel module#

The pm4py.ocel module contains the object-centric process mining features offered in pm4py.

pm4py.ocel.ocel_get_object_types(ocel: OCEL) List[str][source]#

Returns the list of object types contained in the object-centric event log (e.g., [“order”, “item”, “delivery”]).

Parameters:

ocel (OCEL) – Object-centric event log.

Returns:

List of object types.

Return type:

List[str]

import pm4py

object_types = pm4py.ocel_get_object_types(ocel)
pm4py.ocel.ocel_get_attribute_names(ocel: OCEL) List[str][source]#

Returns the list of attributes at the event and object levels of an object-centric event log (e.g., [“cost”, “amount”, “name”]).

Parameters:

ocel (OCEL) – Object-centric event log.

Returns:

List of attribute names.

Return type:

List[str]

import pm4py

attribute_names = pm4py.ocel_get_attribute_names(ocel)
pm4py.ocel.ocel_flattening(ocel: OCEL, object_type: str) DataFrame[source]#

Flattens the object-centric event log to a traditional event log based on a chosen object type. In the flattened log, the objects of the specified type are treated as cases, and each case contains the set of events related to that object. The flattened log follows the XES notations for case identifier, activity, and timestamp. Specifically: - “case:concept:name” is used for the case ID. - “concept:name” is used for the activity. - “time:timestamp” is used for the timestamp.

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • object_type (str) – The object type to use as cases.

Returns:

Flattened traditional event log.

Return type:

pd.DataFrame

import pm4py

event_log = pm4py.ocel_flattening(ocel, 'items')
pm4py.ocel.ocel_object_type_activities(ocel: OCEL) Dict[str, Collection[str]][source]#

Returns the set of activities performed for each object type.

Parameters:

ocel (OCEL) – Object-centric event log.

Returns:

Dictionary mapping object types to their associated activities.

Return type:

Dict[str, Collection[str]]

import pm4py

ot_activities = pm4py.ocel_object_type_activities(ocel)
pm4py.ocel.ocel_objects_ot_count(ocel: OCEL) Dict[str, Dict[str, int]][source]#

Returns the count of related objects per type for each event.

Parameters:

ocel (OCEL) – Object-centric event log.

Returns:

Nested dictionary mapping events to object types and their counts.

Return type:

Dict[str, Dict[str, int]]

import pm4py

objects_ot_count = pm4py.ocel_objects_ot_count(ocel)
pm4py.ocel.ocel_temporal_summary(ocel: OCEL) DataFrame[source]#

Returns the temporal summary of an object-centric event log. The temporal summary aggregates all events that occur at the same timestamp and reports the list of activities and involved objects.

Parameters:

ocel (OCEL) – Object-centric event log.

Returns:

Temporal summary DataFrame.

Return type:

pd.DataFrame

import pm4py

temporal_summary = pm4py.ocel_temporal_summary(ocel)
pm4py.ocel.ocel_objects_summary(ocel: OCEL) DataFrame[source]#

Returns the objects summary of an object-centric event log.

Parameters:

ocel (OCEL) – Object-centric event log.

Returns:

Objects summary DataFrame containing lifecycle information and interacting objects.

Return type:

pd.DataFrame

import pm4py

objects_summary = pm4py.ocel_objects_summary(ocel)
pm4py.ocel.ocel_objects_interactions_summary(ocel: OCEL) DataFrame[source]#

Returns the objects interactions summary of an object-centric event log. The summary includes a row for every combination of (event, related object, other related object). Properties such as the activity of the event and the object types of the two related objects are included.

Parameters:

ocel (OCEL) – Object-centric event log.

Returns:

Objects interactions summary DataFrame.

Return type:

pd.DataFrame

import pm4py

interactions_summary = pm4py.ocel_objects_interactions_summary(ocel)
pm4py.ocel.discover_ocdfg(ocel: OCEL, business_hours: bool = False, business_hour_slots: List[Tuple[int, int]] | None = [(25200, 61200), (111600, 147600), (198000, 234000), (284400, 320400), (370800, 406800)]) Dict[str, Any][source]#

Discovers an Object-Centric Directly-Follows Graph (OC-DFG) from an object-centric event log.

Object-centric directly-follows multigraphs are a composition of directly-follows graphs for each object type. These graphs can be annotated with different metrics considering the entities of an object-centric event log (i.e., events, unique objects, total objects).

Reference paper: Berti, Alessandro, and Wil van der Aalst. “Extracting multiple viewpoint models from relational databases.” Data-Driven Process Discovery and Analysis. Springer, Cham, 2018. 24-51.

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • business_hours (bool) – Enable the usage of business hours if set to True.

  • business_hour_slots (Optional[List[Tuple[int, int]]]) – Work schedule of the company, provided as a list of tuples where each tuple represents one time slot of business hours. Each tuple consists of a start and an end time given in seconds since week start, e.g., [(25200, 61200), (9072, 43200), (46800, 61200)] meaning that business hours are Mondays 07:00 - 17:00, Tuesdays 02:32 - 12:00, and Wednesdays 13:00 - 17:00.

Returns:

OC-DFG discovery result.

Return type:

Dict[str, Any]

import pm4py

ocdfg = pm4py.discover_ocdfg(ocel)
pm4py.ocel.discover_oc_petri_net(ocel: OCEL, inductive_miner_variant: str = 'im', diagnostics_with_tbr: bool = False) Dict[str, Any][source]#

Discovers an object-centric Petri net from the provided object-centric event log.

Reference paper: van der Aalst, Wil MP, and Alessandro Berti. “Discovering object-centric Petri nets.” Fundamenta Informaticae 175.1-4 (2020): 1-40.

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • inductive_miner_variant (str) – Variant of the inductive miner to use (“im” for traditional; “imd” for the faster inductive miner directly-follows).

  • diagnostics_with_tbr (bool) – Enable the computation of diagnostics using token-based replay if set to True.

Returns:

Discovered object-centric Petri net.

Return type:

Dict[str, Any]

import pm4py

ocpn = pm4py.discover_oc_petri_net(ocel)
pm4py.ocel.discover_objects_graph(ocel: OCEL, graph_type: str = 'object_interaction') Set[Tuple[str, str]][source]#

Discovers an object graph from the provided object-centric event log.

Available graph types: - “object_interaction” - “object_descendants” - “object_inheritance” - “object_cobirth” - “object_codeath”

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • graph_type (str) – Type of graph to consider. Options include “object_interaction”, “object_descendants”, “object_inheritance”, “object_cobirth”, “object_codeath”.

Returns:

Discovered object graph as a set of tuples.

Return type:

Set[Tuple[str, str]]

import pm4py

ocel = pm4py.read_ocel('trial.ocel')
obj_graph = pm4py.discover_objects_graph(ocel, graph_type='object_interaction')
pm4py.ocel.ocel_o2o_enrichment(ocel: OCEL, included_graphs: Collection[str] | None = None) OCEL[source]#

Enriches the OCEL with information inferred from graph computations by inserting them into the O2O relations.

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • included_graphs (Optional[Collection[str]]) – Types of graphs to include, provided as a list or set of strings. Options include “object_interaction_graph”, “object_descendants_graph”, “object_inheritance_graph”, “object_cobirth_graph”, “object_codeath_graph”.

Returns:

Enriched object-centric event log.

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel('trial.ocel')
ocel = pm4py.ocel_o2o_enrichment(ocel)
print(ocel.o2o)
pm4py.ocel.ocel_e2o_lifecycle_enrichment(ocel: OCEL) OCEL[source]#

Enriches the OCEL with lifecycle-based information, indicating when an object is created, terminated, or has other types of relations, by updating the E2O relations.

Parameters:

ocel (OCEL) – Object-centric event log.

Returns:

Enriched object-centric event log with lifecycle information.

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel('trial.ocel')
ocel = pm4py.ocel_e2o_lifecycle_enrichment(ocel)
print(ocel.relations)
pm4py.ocel.sample_ocel_objects(ocel: OCEL, num_objects: int) OCEL[source]#

Returns a sampled object-centric event log containing a random subset of objects. Only events related to at least one of the sampled objects are included in the returned log. Note that this sampling may disrupt the relationships between different objects.

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • num_objects (int) – Number of objects to include in the sampled event log.

Returns:

Sampled object-centric event log.

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel('trial.ocel')
sampled_ocel = pm4py.sample_ocel_objects(ocel, 50)  # Keeps only 50 random objects
pm4py.ocel.sample_ocel_connected_components(ocel: OCEL, connected_components: int = 1, max_num_events_per_cc: int = 9223372036854775807, max_num_objects_per_cc: int = 9223372036854775807, max_num_e2o_relations_per_cc: int = 9223372036854775807) OCEL[source]#

Returns a sampled object-centric event log containing a specified number of connected components. Users can also set maximum limits on the number of events, objects, and E2O relations per connected component.

Reference paper: Adams, Jan Niklas, et al. “Defining cases and variants for object-centric event data.” 2022 4th International Conference on Process Mining (ICPM). IEEE, 2022.

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • connected_components (int) – Number of connected components to include in the sampled event log.

  • max_num_events_per_cc (int) – Maximum number of events allowed per connected component (default: sys.maxsize).

  • max_num_objects_per_cc (int) – Maximum number of objects allowed per connected component (default: sys.maxsize).

  • max_num_e2o_relations_per_cc (int) – Maximum number of event-to-object relationships allowed per connected component (default: sys.maxsize).

Returns:

Sampled object-centric event log containing the specified connected components.

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel('trial.ocel')
sampled_ocel = pm4py.sample_ocel_connected_components(ocel, 5)  # Keeps only 5 connected components
pm4py.ocel.ocel_drop_duplicates(ocel: OCEL) OCEL[source]#

Removes duplicate relations between events and objects that occur at the same time, have the same activity, and are linked to the same object identifier. This effectively cleans the OCEL by eliminating duplicate events.

Parameters:

ocel (OCEL) – Object-centric event log.

Returns:

Cleaned object-centric event log without duplicate relations.

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel('trial.ocel')
ocel = pm4py.ocel_drop_duplicates(ocel)
pm4py.ocel.ocel_merge_duplicates(ocel: OCEL, have_common_object: bool | None = False) OCEL[source]#

Merges events in the OCEL that have the same activity and timestamp. Optionally, ensures that the events being merged share a common object.

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • have_common_object (Optional[bool]) – If set to True, only merges events that share a common object. Defaults to False.

Returns:

Object-centric event log with merged duplicate events.

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel('trial.ocel')
ocel = pm4py.ocel_merge_duplicates(ocel)
pm4py.ocel.ocel_sort_by_additional_column(ocel: OCEL, additional_column: str, primary_column: str = 'ocel:timestamp') OCEL[source]#

Sorts the OCEL based on the primary timestamp column and an additional column to determine the order of events occurring at the same timestamp.

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • additional_column (str) – Additional column to use for sorting.

  • primary_column (str) – Primary column to use for sorting (default: “ocel:timestamp”). Typically the timestamp column.

Returns:

Sorted object-centric event log.

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel('trial.ocel')
ocel = pm4py.ocel_sort_by_additional_column(ocel, 'ordering')
pm4py.ocel.ocel_add_index_based_timedelta(ocel: OCEL) OCEL[source]#

Adds a small time delta to the timestamp column based on the event index to ensure the correct ordering of events within any object-centric process mining solution.

Parameters:

ocel (OCEL) – Object-centric event log.

Returns:

Object-centric event log with index-based time deltas added.

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel('trial.ocel')
ocel = pm4py.ocel_add_index_based_timedelta(ocel)
pm4py.ocel.cluster_equivalent_ocel(ocel: OCEL, object_type: str, max_objs: int = 9223372036854775807) Dict[str, Collection[OCEL]][source]#

Clusters the object-centric event log based on the ‘executions’ of a single object type. Equivalent ‘executions’ are grouped together in the output dictionary.

Parameters:
  • ocel (OCEL) – Object-centric event log.

  • object_type (str) – Reference object type for clustering.

  • max_objs (int) – Maximum number of objects (of the specified object type) to include per cluster. Defaults to sys.maxsize.

Returns:

Dictionary mapping cluster descriptions to collections of equivalent OCELs.

Return type:

Dict[str, Collection[OCEL]]

import pm4py

ocel = pm4py.read_ocel('trial.ocel')
clusters = pm4py.cluster_equivalent_ocel(ocel, "order")

pm4py.org module#

The pm4py.org module contains organizational analysis techniques offered in pm4py.

pm4py.org.discover_handover_of_work_network(log: EventLog | DataFrame, beta=0, resource_key: str = 'org:resource', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') SNA[source]#

Calculates the handover of work network of the event log.

The handover of work network is essentially the Directly-Follows Graph (DFG) of the event log, but using the resource as the nodes of the graph instead of activities. As such, resource information should be present in the event log.

Return type:

SNA

Parameters:
  • log – Event log or Pandas DataFrame.

  • beta (int) – Beta parameter for the Handover metric.

  • resource_key (str) – Attribute to be used for the resource.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

import pm4py

metric = pm4py.discover_handover_of_work_network(
    dataframe,
    beta=0,
    resource_key='org:resource',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.org.discover_working_together_network(log: EventLog | DataFrame, resource_key: str = 'org:resource', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') SNA[source]#

Calculates the working together network of the process. Two resource nodes are connected in the graph if the resources collaborate on an instance of the process.

Return type:

SNA

Parameters:
  • log – Event log or Pandas DataFrame.

  • resource_key (str) – Attribute to be used for the resource.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

import pm4py

metric = pm4py.discover_working_together_network(
    dataframe,
    resource_key='org:resource',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.org.discover_activity_based_resource_similarity(log: EventLog | DataFrame, activity_key: str = 'concept:name', resource_key: str = 'org:resource', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') SNA[source]#

Calculates similarity between the resources in the event log based on their activity profiles.

Return type:

SNA

Parameters:
  • log – Event log or Pandas DataFrame.

  • activity_key (str) – Attribute to be used for the activity.

  • resource_key (str) – Attribute to be used for the resource.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

import pm4py

act_res_sim = pm4py.discover_activity_based_resource_similarity(
    dataframe,
    resource_key='org:resource',
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.org.discover_subcontracting_network(log: EventLog | DataFrame, n=2, resource_key: str = 'org:resource', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') SNA[source]#

Calculates the subcontracting network of the process.

Return type:

SNA

Parameters:
  • log – Event log or Pandas DataFrame.

  • n (int) – N parameter for the Subcontracting metric.

  • resource_key (str) – Attribute to be used for the resource.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

import pm4py

metric = pm4py.discover_subcontracting_network(
    dataframe,
    n=2,
    resource_key='org:resource',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.org.discover_organizational_roles(log: EventLog | DataFrame, activity_key: str = 'concept:name', resource_key: str = 'org:resource', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') List[Role][source]#

Mines the organizational roles.

A role is a set of activities in the log that are executed by a similar (multi)set of resources. Hence, it is a specific function within the organization. Grouping the activities into roles can help:

Reference paper: Burattin, Andrea, Alessandro Sperduti, and Marco Veluscek. “Business models enhancement through discovery of roles.” 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE, 2013.

Parameters:
  • log – Event log or Pandas DataFrame.

  • activity_key (str) – Attribute to be used for the activity.

  • resource_key (str) – Attribute to be used for the resource.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as case identifier.

import pm4py

roles = pm4py.discover_organizational_roles(
    dataframe,
    resource_key='org:resource',
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)
pm4py.org.discover_network_analysis(log: DataFrame | EventLog | EventStream, out_column: str, in_column: str, node_column_source: str, node_column_target: str, edge_column: str, edge_reference: str = '_out', performance: bool = False, sorting_column: str = 'time:timestamp', timestamp_column: str = 'time:timestamp') Dict[Tuple[str, str], Dict[str, Any]][source]#

Performs a network analysis of the log based on the provided parameters.

Classical social network analysis methods are based on the order of events within a case. For example, the Handover of Work metric considers the directly-follows relationships between resources during the execution of a case. An edge is added between two resources if such a relationship occurs.

Real-life scenarios may be more complicated. Firstly, it is difficult to collect events within the same case without encountering convergence/divergence issues (see the first section of the OCEL part). Secondly, the type of relationship may also be important. For example, the relationship between two resources may be more efficient if the activity executed is liked by the resources rather than disliked.

The network analysis introduced here generalizes some existing social network analysis metrics, making them independent of the case notion and allowing the construction of a multigraph instead of a simple graph.

We assume events are linked by signals. An event emits a signal (contained in one attribute of the event) that is assumed to be received by other events (also containing this attribute) that follow the first event in the log. We assume there is an OUT attribute (of the event) that is identical to the IN attribute (of the other events).

When collecting this information, we can build the network analysis graph: - The source node of the relationship is determined by aggregating the node_column_source attribute. - The target node of the relationship is determined by aggregating the node_column_target attribute. - The type of edge is determined by aggregating the edge_column attribute. - The network analysis graph can be annotated with frequency or performance information.

The output is a multigraph. Two events EV1 and EV2 in the log are connected (independently of the case notion) based on having EV1.OUT_COLUMN = EV2.IN_COLUMN. Then, an aggregation is applied on the pair of events (NODE_COLUMN) to obtain the connected nodes. The edges between these nodes are aggregated based on some property of the source event (edge_column).

Parameters:
  • log – Event log, Pandas DataFrame, or EventStream.

  • out_column (str) – The source column of the link (default: the case identifier; events of the same case are linked).

  • in_column (str) – The target column of the link (default: the case identifier; events of the same case are linked).

  • node_column_source (str) – The attribute to be used for defining the source node (default: the resource of the log, “org:resource”).

  • node_column_target (str) – The attribute to be used for defining the target node (default: the resource of the log, “org:resource”).

  • edge_column (str) – The attribute to be used for defining the edge (default: the activity of the log, “concept:name”).

  • edge_reference (str) – Determines if the edge attribute should be picked from the source event. Values: “_out” => the source event; “_in” => the target event.

  • performance (bool) – Boolean value that enables performance calculation on the edges of the network analysis.

  • sorting_column (str) – The column to be used for sorting the log before performing the network analysis (default: “time:timestamp”).

  • timestamp_column (str) – The column to be used as timestamp for performance-related analysis (default: “time:timestamp”).

Return type:

Dict[Tuple[str, str], Dict[str, Any]]

import pm4py

net_ana = pm4py.discover_network_analysis(
    dataframe,
    out_column='case:concept:name',
    in_column='case:concept:name',
    node_column_source='org:resource',
    node_column_target='org:resource',
    edge_column='concept:name',
    edge_reference='_out',
    performance=False,
    sorting_column='time:timestamp',
    timestamp_column='time:timestamp'
)

pm4py.privacy module#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

pm4py.privacy.anonymize_differential_privacy(log: EventLog | DataFrame, epsilon: float = 1.0, k: int = 10, p: int = 20) DataFrame[source]#

Protect event logs with differential privacy. Differential privacy is a guarantee that bounds the impact the data of one individual has on a query result.

Control-flow information is anonymized with SaCoFa. This algorithm inserts noise into a trace-variant count, through the step-wise construction of a prefix tree.

Contextual-information, like timestamps or resources, is anonymized with PRIPEL. This technique enriches a control-flow anonymized event log with contextual information from the original log, while still achieving differential privacy. PRIPEL anonymizes each event’s timestamp and other attributes, that are stored as strings, integers, floats, or booleans.

Please install diffprivlib https://diffprivlib.readthedocs.io/en/latest/ (pip install diffprivlib==0.5.2) to run our algorithm.

SaCoFa is described in: S. A. Fahrenkog-Petersen, M. Kabierski, F. Rösel, H. van der Aa and M. Weidlich, “SaCoFa: Semantics-aware Control-flow Anonymization for Process Mining,” 2021 3rd International Conference on Process Mining (ICPM), 2021, pp. 72-79. https://doi.org/10.48550/arXiv.2109.08501

PRIPEL is described in: Fahrenkrog-Petersen, S.A., van der Aa, H., Weidlich, M. (2020). PRIPEL: Privacy-Preserving Event Log Publishing Including Contextual Information. In: Fahland, D., Ghidini, C., Becker, J., Dumas, M. (eds) Business Process Management. BPM 2020. Lecture Notes in Computer Science, vol 12168. Springer, Cham. https://doi.org/10.1007/978-3-030-58666-9_7

Parameters:
  • log – event log / Pandas dataframe

  • epsilon (float) – the strength of the differential privacy guarantee. The smaller the value of epsilon, the stronger the privacy guarantee that is provided.

  • k (int) – the maximal length of considered traces in the prefix tree. We recommend setting k, that roughly 80% of all traces from the original event log are covered.

  • p (int) – the pruning parameter, which denotes the minimum count a prefix has to have in order to not be discarded. The dependent exponential runtime of the algorithms is mitigated by the pruning parameter.

Return type:

pd.DataFrame

import pm4py

event_log = pm4py.read_xes("running-example.xes")
anonymized_event_log = pm4py.anonymize_differential_privacy(event_log, epsilon=1.0, k=10, p=20)

pm4py.read module#

The pm4py.read module contains all functionality related to reading files and objects from disk.

pm4py.read.read_xes(file_path: str, variant: str | None = None, return_legacy_log_object: bool = False, encoding: str = 'utf-8', **kwargs) DataFrame | EventLog[source]#

Reads an event log stored in XES format (see xes-standard). Returns a table (pandas.DataFrame) view of the event log or an EventLog object.

Parameters:
  • file_path (str) – Path to the event log (.xes file) on disk.

  • variant – Variant of the importer to use. Options include: - “iterparse” – traditional XML parser, - “line_by_line” – text-based line-by-line importer, - “chunk_regex” – chunk-of-bytes importer (default), - “iterparse20” – XES 2.0 importer, - “rustxes” – Rust-based importer.

  • return_legacy_log_object (bool) – Boolean indicating whether to return a legacy EventLog object (default: False).

  • encoding (str) – Encoding to be used (default: utf-8).

  • **kwargs

    Additional parameters to pass to the importer.

Return type:

pandas.DataFrame or pm4py.objects.log.obj.EventLog

import pm4py

log = pm4py.read_xes("<path_to_xes_file>")
pm4py.read.read_pnml(file_path: str, auto_guess_final_marking: bool = False, encoding: str = 'utf-8') Tuple[PetriNet, Marking, Marking][source]#

Reads a Petri net object from a .pnml file. The returned Petri net object is a tuple containing:

  1. PetriNet object (PetriNet)

  2. Initial Marking (Marking)

  3. Final Marking (Marking)

Parameters:
  • file_path (str) – Path to the Petri net model (.pnml file) on disk.

  • auto_guess_final_marking (bool) – Boolean indicating whether to automatically guess the final marking (default: False).

  • encoding (str) – Encoding to be used (default: utf-8).

Return type:

Tuple[PetriNet, Marking, Marking]

import pm4py

pn = pm4py.read_pnml("<path_to_pnml_file>")
pm4py.read.read_ptml(file_path: str, encoding: str = 'utf-8') ProcessTree[source]#

Reads a process tree object from a .ptml file.

Parameters:
  • file_path (str) – Path to the process tree file on disk.

  • encoding (str) – Encoding to be used (default: utf-8).

Return type:

ProcessTree

import pm4py

process_tree = pm4py.read_ptml("<path_to_ptml_file>")
pm4py.read.read_dfg(file_path: str, encoding: str = 'utf-8')