pm4py.analysis.cluster_log#
- pm4py.analysis.cluster_log(log: EventLog | EventStream | DataFrame, sklearn_clusterer=None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Generator[EventLog, None, None] [source]#
Applies clustering to the provided event log by extracting profiles for the log’s traces and clustering them using a Scikit-Learn clusterer (default is K-Means with two clusters).
- Parameters:
log – The event log to cluster.
sklearn_clusterer – (Optional) The Scikit-Learn clusterer to use. Default is KMeans with n_clusters=2, random_state=0, and n_init=”auto”.
activity_key (
str
) – The key used to identify activities in the log.timestamp_key (
str
) – The key used to identify timestamps in the log.case_id_key (
str
) – The key used to identify case IDs in the log.
- Returns:
A generator that yields clustered event logs as pandas DataFrames.
- Return type:
Generator[pd.DataFrame, None, None]
import pm4py for clust_log in pm4py.cluster_log(df): print(clust_log)