pm4py.ml.extract_features_dataframe#
- pm4py.ml.extract_features_dataframe(log: EventLog | DataFrame, str_tr_attr: List[str] | None = None, num_tr_attr: List[str] | None = None, str_ev_attr: List[str] | None = None, num_ev_attr: List[str] | None = None, str_evsucc_attr: List[str] | None = None, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str | None = None, resource_key: str = 'org:resource', include_case_id: bool = False, **kwargs) DataFrame [source]#
Extracts a dataframe containing features for each case in the provided log object.
This function processes the log to generate a set of features that can be used for machine learning tasks. Features can include both case-level and event-level attributes, with options for one-hot encoding.
- Parameters:
log – The event log or Pandas DataFrame from which to extract features.
str_tr_attr – (Optional) List of string attributes at the case level to extract as features.
num_tr_attr – (Optional) List of numeric attributes at the case level to extract as features.
str_ev_attr – (Optional) List of string attributes at the event level to extract as features (one-hot encoded).
num_ev_attr – (Optional) List of numeric attributes at the event level to extract as features (uses the last value per attribute in a case).
str_evsucc_attr – (Optional) List of string successor attributes at the event level to extract as features.
activity_key (
str
) – Attribute to be used as the activity identifier.timestamp_key (
str
) – Attribute to be used for timestamps.case_id_key – (Optional) Attribute to be used as the case identifier. If not provided, the default is used.
resource_key (
str
) – Attribute to be used as the resource identifier.include_case_id (
bool
) – Whether to include the case identifier column in the features table.**kwargs –
Additional keyword arguments to pass to the feature extraction algorithm.
- Returns:
A Pandas DataFrame containing the extracted features for each case.
- Return type:
pd.DataFrame
import pm4py features_df = pm4py.extract_features_dataframe( dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp' )