pm4py.ml.split_train_test#
- pm4py.ml.split_train_test(log: EventLog | DataFrame, train_percentage: float = 0.8, case_id_key: str = 'case:concept:name') Tuple[EventLog, EventLog] | Tuple[DataFrame, DataFrame] [source]#
Splits an event log into a training log and a test log for machine learning purposes.
This function separates the provided log into two parts based on the specified training percentage. It ensures that entire cases are included in either the training set or the test set.
- Parameters:
log – The event log or Pandas DataFrame to be split.
train_percentage (
float
) – Fraction of cases to be included in the training log (between 0.0 and 1.0).case_id_key (
str
) – Attribute to be used as the case identifier.
- Returns:
A tuple containing the training and test event logs or DataFrames.
- Return type:
Union[Tuple[EventLog, EventLog], Tuple[pd.DataFrame, pd.DataFrame]]
import pm4py train_df, test_df = pm4py.split_train_test(dataframe, train_percentage=0.75)