pm4py.stats.get_variants_paths_duration#
- pm4py.stats.get_variants_paths_duration(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', variant_column: str = '@@variant_column', variant_count: str = '@@variant_count', index_in_trace_column: str = '@@index_in_trace', cumulative_occ_path_column: str = '@@cumulative_occ_path_column', times_agg: str = 'mean') DataFrame [source]#
Method that associates to a log object a Pandas dataframe aggregated by variants and positions (inside the variant). Each row is associated to different columns: - The variant - The position (in the variant) - The source activity (of the path) - The target activity (of the path) - An aggregation of the times between the two activities (for example, the mean over all the cases of the same variant) - The cumulative occurrences of the path inside the case (for example, the first A->B would be associated to 0,
and the second A->B would be associated to 1)
- Parameters:
log – Event log
activity_key (
str
) – attribute to be used for the activitytimestamp_key (
str
) – attribute to be used for the timestampcase_id_key (
str
) – attribute to be used as case identifiervariant_column (
str
) – name of the utility column that stores the variant’s tuplevariant_count (
str
) – name of the utility column that stores the variant’s number of occurrencesindex_in_trace_column (
str
) – name of the utility column that stores the index of the event in the casecumulative_occ_path_column (
str
) – name of the column that stores the cumulative occurrences of the path inside the casetimes_agg (
str
) – aggregation (mean, median) to be used
- Return type:
pd.DataFrame
import pandas as pd import pm4py dataframe = pd.read_csv('tests/input_data/receipt.csv') dataframe = pm4py.format_dataframe(dataframe) var_paths_durs = pm4py.get_variants_paths_duration(dataframe) print(var_paths_durs)