pm4py.stats.split_by_process_variant#

pm4py.stats.split_by_process_variant(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', variant_column: str = '@@variant_column', index_in_trace_column: str = '@@index_in_trace') Iterator[Tuple[Collection[str], DataFrame]][source]#

Splits an event log into sub-dataframes for each process variant. The result is an iterator over the variants along with their corresponding sub-dataframes.

Parameters:
  • log – Event log (EventLog or pandas DataFrame).

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as the case identifier.

  • variant_column (str) – Name of the utility column that stores the variant’s tuple.

  • index_in_trace_column (str) – Name of the utility column that stores the index of the event in the case.

Returns:

An iterator of tuples, each containing a variant and its corresponding sub-dataframe.

import pandas as pd
import pm4py

dataframe = pd.read_csv('tests/input_data/receipt.csv')
dataframe = pm4py.format_dataframe(dataframe)
for variant, subdf in pm4py.split_by_process_variant(dataframe):
    print(variant)
    print(subdf)