pm4py.algo.concept_drift.variants package#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
Submodules#
pm4py.algo.concept_drift.variants.bose module#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
- class pm4py.algo.concept_drift.variants.bose.Parameters(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
Enum- SUB_LOG_SIZE = 'sub_log_size'#
- WINDOW_SIZE = 'window_size'#
- NUM_PERMUTATIONS = 'num_permutations'#
- THRESH_P_VALUE = 'thresh_p_value'#
- MAX_NO_CHANGE_POINTS = 'max_no_change_points'#
- ACTIVITY_KEY = 'pm4py:param:activity_key'#
- TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
- CASE_ID_KEY = 'pm4py:param:case_id_key'#
- pm4py.algo.concept_drift.variants.bose.apply(log: EventLog | DataFrame, parameters: Dict[Any, Any] | None = None) Tuple[List[DataFrame], List[int], List[float]][source]#
Apply concept drift detection to an event log, based on the approach described in:
Bose, RP Jagadeesh Chandra, et al. “Handling concept drift in process mining.” Advanced Information Systems Engineering: 23rd International Conference, CAiSE 2011, London, UK, June 20-24, 2011. Proceedings 23. Springer Berlin Heidelberg, 2011.
This method detects sudden changes (concept drifts) in a process by analyzing an event log over time. It splits the log into sub-logs, extracts global features (e.g., Relation Type Count), and applies statistical tests (permutation tests) over sliding windows to identify change points where the process behavior significantly differs.
Parameters#
- logUnion[EventLog, pd.DataFrame]
The input event log, which can be either a PM4Py EventLog object or a Pandas DataFrame. The log contains traces, where each trace is a sequence of events representing a process instance.
- parametersOptional[Dict[Any, Any]], default=None
Configuration parameters for the algorithm. If None, default values are used. Possible keys include: - Parameters.SUB_LOG_SIZE : int, default=50
Number of traces per sub-log.
- Parameters.WINDOW_SIZEint, default=8
Number of sub-logs in each window for statistical comparison.
- Parameters.NUM_PERMUTATIONSint, default=100
Number of permutations for the permutation test.
- Parameters.THRESH_P_VALUEfloat, default=0.5
Threshold for p-values to consider a change point significant (lower values indicate stronger evidence of drift).
- Parameters.MAX_NO_CHANGE_POINTSint, default=5
Maximum number of change points to detect.
- Parameters.ACTIVITY_KEYstr, default=’concept:name’
Key to identify the activity attribute in the event log.
- Parameters.TIMESTAMP_KEYstr, default=’time:timestamp’
Key to identify the timestamp attribute in the event log.
- Parameters.CASE_ID_KEYstr, default=’case:concept:name’
Key to identify the case ID attribute in the event log.
Returns#
- returned_sublogsList[EventLog]
A list of sub-logs, where each sub-log is an EventLog object representing the cumulative segment of the original event log from the start up to each detected change point (and the final sub-log up to the end). Note: Due to a potential implementation issue, these sub-logs are not segments between change points but rather cumulative logs up to each change point.
- change_timestampsList[float]
A list of timestamps where concept drifts are detected. Each timestamp corresponds to the start time of the first trace in the sub-log where a change point occurs, based on case start timestamps.
- p_valuesList[float]
A list of p-values associated with each detected change point, indicating the statistical significance of the drift (lower values suggest stronger evidence of a change).
Notes#
The method uses a permutation test to compare feature vectors (e.g., Relation Type Count) extracted from sub-logs within sliding windows. Change points are identified where the p-value falls below the threshold.
- pm4py.algo.concept_drift.variants.bose.extract_unique_activities(event_log)[source]#
Extract unique activities from the event log.
- pm4py.algo.concept_drift.variants.bose.split_into_sub_logs(event_log, sub_log_size=50, keep_leftover=True)[source]#
Split the event log (list of traces) into sub-logs of size sub_log_size. Optionally keep leftover traces as a final smaller sub-log.
- pm4py.algo.concept_drift.variants.bose.compute_follows_relation(trace, Sigma)[source]#
Compute the ‘eventually follows’ relation for a single trace.
GLOBAL FOLLOWS (original): - For each activity ‘a’ encountered so far,
mark that ‘a’ is followed by the current activity.
DIRECT FOLLOWS (commented out): - For each consecutive pair (a, b), add (a, b).
- pm4py.algo.concept_drift.variants.bose.extract_global_features(sub_log, Sigma)[source]#
Extract the Relation Type Count (RC) feature vector for a sub-log. For each activity b in Sigma, we compute:
ca = # of activities a where b ALWAYS follows a (in all traces that contain ‘a’)
cs = # of activities a where b SOMETIMES follows a
cn = # of activities a where b NEVER follows a
- pm4py.algo.concept_drift.variants.bose.permutation_test(P1, P2, num_permutations=100)[source]#
Perform a permutation test to compare the Euclidean distance between means of two sets of feature vectors P1 and P2.
- pm4py.algo.concept_drift.variants.bose.detect_concept_drift(event_log, sub_log_size=50, window_size=8, num_permutations=100, thresh_p_value=0.5, max_no_change_points=5)[source]#
Detect concept drift in an event log using a permutation test over consecutive windows of sub-logs.
Parameters: - event_log: List of lists, where each inner list is a trace of activities. - sub_log_size: Number of traces per sub-log (default: 50). - window_size: Number of sub-logs in each window for comparison (default: 8). - num_permutations: Number of permutations for the statistical test (default: 100). - thresh_p_value: Threshold for the p-value - max_no_change_points: Maximum number of change points detected
Returns: - sub_logs: list of sub-logs, each with (up to) sub_log_size traces - change_points: list of sub-log indices and p-values where drift is detected