pm4py.analysis.cluster_log#

pm4py.analysis.cluster_log(log: EventLog | EventStream | DataFrame, sklearn_clusterer=None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Generator[EventLog, None, None][source]#

Applies clustering to the provided event log by extracting profiles for the log’s traces and clustering them using a Scikit-Learn clusterer (default is K-Means with two clusters).

Parameters:
  • log – The event log to cluster.

  • sklearn_clusterer – (Optional) The Scikit-Learn clusterer to use. Default is KMeans with n_clusters=2, random_state=0, and n_init=”auto”.

  • activity_key (str) – The key used to identify activities in the log.

  • timestamp_key (str) – The key used to identify timestamps in the log.

  • case_id_key (str) – The key used to identify case IDs in the log.

Returns:

A generator that yields clustered event logs as pandas DataFrames.

Return type:

Generator[pd.DataFrame, None, None]

import pm4py

for clust_log in pm4py.cluster_log(df):
    print(clust_log)