pm4py.ml.extract_features_dataframe#
- pm4py.ml.extract_features_dataframe(log: EventLog | DataFrame, str_tr_attr=None, num_tr_attr=None, str_ev_attr=None, num_ev_attr=None, str_evsucc_attr=None, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key=None, resource_key='org:resource', include_case_id: bool = False, **kwargs) DataFrame [source]#
Extracts a dataframe containing the features of each case of the provided log object
- Parameters:
log – log object (event log / Pandas dataframe)
str_tr_attr – (if provided) string attributes at the case level which should be extracted as features
num_tr_attr – (if provided) numeric attributes at the case level which should be extracted as features
str_ev_attr – (if provided) string attributes at the event level which should be extracted as features (one-hot encoding)
num_ev_attr – (if provided) numeric attributes at the event level which should be extracted as features (last value per attribute in a case)
activity_key (
str
) – the attribute to be used as activitytimestamp_key (
str
) – the attribute to be used as timestampcase_id_key – (if provided, otherwise default) the attribute to be used as case identifier
resource_key (
str
) – the attribute to be used as resourceinclude_case_id (
bool
) – includes the case identifier column in the features table
- Return type:
pd.DataFrame
import pm4py features_df = pm4py.extract_features_dataframe(dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp')