Feature selection operations allow representing the event log in a tabular format. This is crucial for tasks such as prediction and anomaly detection.
In PM4Py, we offer methods to perform automatic feature selection. As an example, let's import the receipt log and apply automatic feature selection. First, we import the receipt log:
Then, let's perform automatic feature selection:
Printing the value of feature_names
, we observe that the following attributes were selected:
channel
attribute at the trace level (with values: Desk, Intern, Internet, Post, e-mail).department
attribute at the trace level (with values: Customer contact, Experts, General).group
attribute at the event level (with values: EMPTY, Group 1, Group 12, Group 13, Group 14, Group 15, Group 2, Group 3, Group 4, Group 7). No numeric attribute is selected. The printed feature_names
are represented as:
[ trace:channel@Desk
, trace:channel@Intern
, trace:channel@Internet
, trace:channel@Post
, trace:channel@e-mail
, trace:department@Customer contact
, trace:department@Experts
, trace:department@General
, event:org:group@EMPTY
, event:org:group@Group 1
, event:org:group@Group 12
, event:org:group@Group 13
, event:org:group@Group 14
, event:org:group@Group 15
, event:org:group@Group 2
, event:org:group@Group 3
, event:org:group@Group 4
, event:org:group@Group 7
].
As shown, different features correspond to different attribute values. This technique is called one-hot encoding: a case is assigned a value of 0 if it does not contain an event with the given attribute value, and 1 if it contains at least one such event.
Representing the features as a dataframe:
We can observe the features assigned to each individual case.
Manual feature selection allows users to specify which attributes should be included. These may include, for example:
concept:name
). org:resource
). To perform manual feature selection, we use the method log_to_features.apply
. The following types of features can be considered:
Parameter | Description |
---|---|
str_ev_attr | String attributes at the event level, one-hot encoded to assume values of 0 or 1. |
str_tr_attr | String attributes at the trace level, one-hot encoded to assume values of 0 or 1. |
num_ev_attr | Numeric attributes at the event level, encoded by taking the last value observed among the trace's events. |
num_tr_attr | Numeric attributes at the trace level, encoded by including their numeric value. |
str_evsucc_attr | Successions of string attribute values at the event level: for instance, given a trace [A, B, C], features will include not only A, B, and C individually, but also directly-follows pairs (A, B) and (B, C). |
For example, consider a feature selection where we are interested in:
In this case, the number of features becomes significantly larger.
Other important features include the cycle time and the lead time associated with a case. In this context, we may assume one of the following:
Lead and cycle times can be calculated directly from interval logs. If we have a lifecycle log, we first need to convert it using:
After conversion, features such as lead and cycle times can be added using the following instructions:
Once the start timestamp attribute (e.g., start_timestamp
) and the timestamp attribute (e.g., time:timestamp
) are provided, the following features are returned:
@@approx_bh_partial_cycle_time
: Incremental cycle time associated with the event (the final event's cycle time is the instance's total cycle time).@@approx_bh_partial_lead_time
: Incremental lead time associated with the event. @@approx_bh_overall_wasted_time
: Difference between the partial lead time and the partial cycle time.@@approx_bh_this_wasted_time
: Wasted time specifically related to the activity described by the 'interval' event.@@approx_bh_ratio_cycle_lead_time
: Measures the incremental flow rate (ranging from 0 to 1).Since these are all numerical attributes, we can further refine the feature extraction by applying:
Additionally, we offer the calculation of further intra- and inter-case features, which can be enabled by setting boolean parameters in the log_to_features.apply
method, including:
ENABLE_CASE_DURATION
: Adds case duration as an additional feature.ENABLE_TIMES_FROM_FIRST_OCCURRENCE
: Adds times measured from the first occurrence of an activity within the case.ENABLE_TIMES_FROM_LAST_OCCURRENCE
: Adds times measured from the last occurrence of an activity within the case.ENABLE_DIRECT_PATHS_TIMES_LAST_OCC
: Adds the duration of the last occurrence of a directed (i, i+1) path as a feature.ENABLE_INDIRECT_PATHS_TIMES_LAST_OCC
: Adds the duration of the last occurrence of an indirect (i, j) path as a feature.ENABLE_WORK_IN_PROGRESS
: Adds the number of concurrent cases as a feature (work in progress).ENABLE_RESOURCE_WORKLOAD
: Adds the workload of resources as a feature.Techniques such as clustering, prediction, and anomaly detection can suffer when the dataset has too many features. Therefore, dimensionality reduction techniques (like PCA) help manage the complexity of the data. Starting from a Pandas dataframe generated from the extracted features:
It is possible to reduce the number of features using PCA. For example, we can create a PCA model with 5 components and apply it to the dataframe:
In this way, more than 400 columns are reduced to 5 principal components that capture most of the data variance.
In this section, we focus on calculating an anomaly score for each case. This score is based on the extracted features and works best when combined with dimensionality reduction (such as PCA). We can apply a method called IsolationForest
to the dataframe, which adds a column of scores: cases with a score ≤ 0 are considered anomalous, while those with a score > 0 are not.
To identify the most anomalous cases, we can sort the dataframe after inserting an index. The resulting output highlights the most anomalous cases:
We might be interested in observing how features evolve over time to detect positions in the event log that show behavior different from the mainstream. PM4Py provides a method to graph feature evolution over time. Here is an example:
Some machine learning methods (e.g., LSTM-based deep learning) require features at the event level, instead of aggregating features at the case level. In these methods, each event is represented as a numerical row containing features related to that event. We can perform a default event-based feature extraction as follows:
Alternatively, it is possible to manually specify the features to be extracted. The parameters str_ev_attr
and num_ev_attr
correspond to those described in previous sections:
Decision trees are tools that help understand the conditions leading to a particular outcome. In this section, several examples related to the construction of decision trees are provided. The ideas behind building decision trees are discussed in the scientific paper: de Leoni, Massimiliano, Wil MP van der Aalst, and Marcus Dees. "A General Process Mining Framework for Correlating, Predicting, and Clustering Dynamic Behavior Based on Event Logs."
The general procedure is as follows:
A process instance may potentially finish with different activities, signaling different outcomes. A decision tree can help understand the reasons behind each outcome. First, a log is loaded, and then a feature-based representation of the log is created.
Alternatively, an automatic feature representation (automatic attribute selection) can be obtained:
(Optional) The extracted features can be represented as a Pandas DataFrame:
(Optional) The DataFrame can then be exported as a CSV file:
Next, the target classes are defined: each endpoint activity of the process instance is assigned to a different class.
The decision tree is then built and visualized:
A decision tree regarding the duration of a case helps understand the factors behind a high case duration (i.e., durations above a given threshold). First, a log is loaded, and a feature-based representation is created.
Alternatively, an automatic feature representation can be generated:
Then, the target classes are formed:
The decision tree is then built and visualized:
Decision Mining enables the following, given:
It retrieves the features of the cases that take different paths. This allows, for example, building a decision tree to explain the choices made.
First, import a XES log:
Next, calculate a model using the Inductive Miner:
To visualize the model:
For this example, we select decision point p_10
, where a choice is made between the activities examine casually and examine thoroughly. Once we have a log, a model, and a decision point, the decision mining algorithm can be executed:
The outputs of the apply
method are:
X
: A Pandas DataFrame containing the features associated with each case leading to a decision.y
: A Pandas Series containing the class (output) of each decision (e.g., 0 or 1).class_names
: The names of the possible decision outcomes (e.g., examine casually and examine thoroughly).These outputs can be used with any classification or comparison technique. In particular, decision trees are a useful choice. We provide a function to automatically discover decision trees from decision mining results:
To visualize the resulting decision tree:
While the feature extraction described above is generic, it might not be optimal (performance-wise) when working directly with Pandas DataFrames. We also offer the option to extract a feature table by providing:
The output is another DataFrame containing:
Here is an example that keeps concept:name
(activity) and amount
(cost) as features:
The resulting feature table will contain columns such as:
['case:concept:name', 'concept:name_CreateFine', 'concept:name_SendFine', 'concept:name_InsertFineNotification', 'concept:name_Addpenalty', 'concept:name_SendforCreditCollection', 'concept:name_Payment', 'concept:name_InsertDateAppealtoPrefecture', 'concept:name_SendAppealtoPrefecture', 'concept:name_ReceiveResultAppealfromPrefecture', 'concept:name_NotifyResultAppealtoOffender', 'amount']
Given a Petri net discovered by a classical process mining algorithm (e.g., Alpha Miner or Inductive Miner), we can enhance it into a Data Petri Net by applying decision mining at every decision point, and transforming the resulting decision trees into guards (boolean conditions).
An example:
The guards discovered for each transition can be printed. They are expressed as boolean conditions and interpreted by the execution engine: