pm4py.stats.get_variants_paths_duration#
- pm4py.stats.get_variants_paths_duration(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', variant_column: str = '@@variant_column', variant_count: str = '@@variant_count', index_in_trace_column: str = '@@index_in_trace', cumulative_occ_path_column: str = '@@cumulative_occ_path_column', times_agg: str = 'mean') DataFrame [source]#
Associates a pandas DataFrame aggregated by variants and their positions within each variant. Each row includes: - The variant - The position within the variant - The source activity of the path - The target activity of the path - An aggregation of the times between the two activities (e.g., mean) - The cumulative occurrences of the path within the case
- Return type:
DataFrame
- Parameters:
log – Event log (EventLog or pandas DataFrame).
activity_key (
str
) – Attribute to be used for the activity.timestamp_key (
str
) – Attribute to be used for the timestamp.case_id_key (
str
) – Attribute to be used as the case identifier.variant_column (
str
) – Name of the utility column that stores the variant’s tuple.variant_count (
str
) – Name of the utility column that stores the variant’s occurrence count.index_in_trace_column (
str
) – Name of the utility column that stores the index of the event in the case.cumulative_occ_path_column (
str
) – Name of the column that stores the cumulative occurrences of the path within the case.times_agg (
str
) – Aggregation function to be used for time differences (e.g., “mean”, “median”).
- Returns:
A pandas DataFrame with the aggregated variant paths and durations.
import pandas as pd import pm4py dataframe = pd.read_csv('tests/input_data/receipt.csv') dataframe = pm4py.format_dataframe(dataframe) var_paths_durs = pm4py.get_variants_paths_duration(dataframe) print(var_paths_durs)