pm4py.ml.extract_features_dataframe#

pm4py.ml.extract_features_dataframe(log: EventLog | DataFrame, str_tr_attr: List[str] | None = None, num_tr_attr: List[str] | None = None, str_ev_attr: List[str] | None = None, num_ev_attr: List[str] | None = None, str_evsucc_attr: List[str] | None = None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str | None = None, resource_key: str = 'org:resource', include_case_id: bool = False, **kwargs) DataFrame[source]#

Extracts a dataframe containing features for each case in the provided log object.

This function processes the log to generate a set of features that can be used for machine learning tasks. Features can include both case-level and event-level attributes, with options for one-hot encoding.

Parameters:
  • log – The event log or Pandas DataFrame from which to extract features.

  • str_tr_attr – (Optional) List of string attributes at the case level to extract as features.

  • num_tr_attr – (Optional) List of numeric attributes at the case level to extract as features.

  • str_ev_attr – (Optional) List of string attributes at the event level to extract as features (one-hot encoded).

  • num_ev_attr – (Optional) List of numeric attributes at the event level to extract as features (uses the last value per attribute in a case).

  • str_evsucc_attr – (Optional) List of string successor attributes at the event level to extract as features.

  • activity_key (str) – Attribute to be used as the activity identifier.

  • timestamp_key (str) – Attribute to be used for timestamps.

  • case_id_key – (Optional) Attribute to be used as the case identifier. If not provided, the default is used.

  • resource_key (str) – Attribute to be used as the resource identifier.

  • include_case_id (bool) – Whether to include the case identifier column in the features table.

  • **kwargs

    Additional keyword arguments to pass to the feature extraction algorithm.

Returns:

A Pandas DataFrame containing the extracted features for each case.

Return type:

pd.DataFrame

import pm4py

features_df = pm4py.extract_features_dataframe(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)