pm4py.ml.extract_features_dataframe#

pm4py.ml.extract_features_dataframe(log: EventLog | DataFrame, str_tr_attr=None, num_tr_attr=None, str_ev_attr=None, num_ev_attr=None, str_evsucc_attr=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key=None, resource_key='org:resource', include_case_id: bool = False, **kwargs) DataFrame[source]#

Extracts a dataframe containing the features of each case of the provided log object

Parameters:
  • log – log object (event log / Pandas dataframe)

  • str_tr_attr – (if provided) string attributes at the case level which should be extracted as features

  • num_tr_attr – (if provided) numeric attributes at the case level which should be extracted as features

  • str_ev_attr – (if provided) string attributes at the event level which should be extracted as features (one-hot encoding)

  • num_ev_attr – (if provided) numeric attributes at the event level which should be extracted as features (last value per attribute in a case)

  • activity_key (str) – the attribute to be used as activity

  • timestamp_key (str) – the attribute to be used as timestamp

  • case_id_key – (if provided, otherwise default) the attribute to be used as case identifier

  • resource_key (str) – the attribute to be used as resource

  • include_case_id (bool) – includes the case identifier column in the features table

Return type:

pd.DataFrame

import pm4py

features_df = pm4py.extract_features_dataframe(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')