pm4py.algo.concept_drift.variants package#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

Submodules#

pm4py.algo.concept_drift.variants.bose module#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

class pm4py.algo.concept_drift.variants.bose.Parameters(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

SUB_LOG_SIZE = 'sub_log_size'#
WINDOW_SIZE = 'window_size'#
NUM_PERMUTATIONS = 'num_permutations'#
THRESH_P_VALUE = 'thresh_p_value'#
MAX_NO_CHANGE_POINTS = 'max_no_change_points'#
ACTIVITY_KEY = 'pm4py:param:activity_key'#
TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
CASE_ID_KEY = 'pm4py:param:case_id_key'#
pm4py.algo.concept_drift.variants.bose.apply(log: EventLog | DataFrame, parameters: Dict[Any, Any] | None = None) Tuple[List[DataFrame], List[int], List[float]][source]#

Apply concept drift detection to an event log, based on the approach described in:

Bose, RP Jagadeesh Chandra, et al. “Handling concept drift in process mining.” Advanced Information Systems Engineering: 23rd International Conference, CAiSE 2011, London, UK, June 20-24, 2011. Proceedings 23. Springer Berlin Heidelberg, 2011.

This method detects sudden changes (concept drifts) in a process by analyzing an event log over time. It splits the log into sub-logs, extracts global features (e.g., Relation Type Count), and applies statistical tests (permutation tests) over sliding windows to identify change points where the process behavior significantly differs.

Parameters#

logUnion[EventLog, pd.DataFrame]

The input event log, which can be either a PM4Py EventLog object or a Pandas DataFrame. The log contains traces, where each trace is a sequence of events representing a process instance.

parametersOptional[Dict[Any, Any]], default=None

Configuration parameters for the algorithm. If None, default values are used. Possible keys include: - Parameters.SUB_LOG_SIZE : int, default=50

Number of traces per sub-log.

  • Parameters.WINDOW_SIZEint, default=8

    Number of sub-logs in each window for statistical comparison.

  • Parameters.NUM_PERMUTATIONSint, default=100

    Number of permutations for the permutation test.

  • Parameters.THRESH_P_VALUEfloat, default=0.5

    Threshold for p-values to consider a change point significant (lower values indicate stronger evidence of drift).

  • Parameters.MAX_NO_CHANGE_POINTSint, default=5

    Maximum number of change points to detect.

  • Parameters.ACTIVITY_KEYstr, default=’concept:name’

    Key to identify the activity attribute in the event log.

  • Parameters.TIMESTAMP_KEYstr, default=’time:timestamp’

    Key to identify the timestamp attribute in the event log.

  • Parameters.CASE_ID_KEYstr, default=’case:concept:name’

    Key to identify the case ID attribute in the event log.

Returns#

returned_sublogsList[EventLog]

A list of sub-logs, where each sub-log is an EventLog object representing the cumulative segment of the original event log from the start up to each detected change point (and the final sub-log up to the end). Note: Due to a potential implementation issue, these sub-logs are not segments between change points but rather cumulative logs up to each change point.

change_timestampsList[float]

A list of timestamps where concept drifts are detected. Each timestamp corresponds to the start time of the first trace in the sub-log where a change point occurs, based on case start timestamps.

p_valuesList[float]

A list of p-values associated with each detected change point, indicating the statistical significance of the drift (lower values suggest stronger evidence of a change).

Notes#

  • The method uses a permutation test to compare feature vectors (e.g., Relation Type Count) extracted from sub-logs within sliding windows. Change points are identified where the p-value falls below the threshold.

pm4py.algo.concept_drift.variants.bose.extract_unique_activities(event_log)[source]#

Extract unique activities from the event log.

pm4py.algo.concept_drift.variants.bose.split_into_sub_logs(event_log, sub_log_size=50, keep_leftover=True)[source]#

Split the event log (list of traces) into sub-logs of size sub_log_size. Optionally keep leftover traces as a final smaller sub-log.

pm4py.algo.concept_drift.variants.bose.compute_follows_relation(trace, Sigma)[source]#

Compute the ‘eventually follows’ relation for a single trace.

GLOBAL FOLLOWS (original): - For each activity ‘a’ encountered so far,

mark that ‘a’ is followed by the current activity.

DIRECT FOLLOWS (commented out): - For each consecutive pair (a, b), add (a, b).

pm4py.algo.concept_drift.variants.bose.extract_global_features(sub_log, Sigma)[source]#

Extract the Relation Type Count (RC) feature vector for a sub-log. For each activity b in Sigma, we compute:

  • ca = # of activities a where b ALWAYS follows a (in all traces that contain ‘a’)

  • cs = # of activities a where b SOMETIMES follows a

  • cn = # of activities a where b NEVER follows a

pm4py.algo.concept_drift.variants.bose.permutation_test(P1, P2, num_permutations=100)[source]#

Perform a permutation test to compare the Euclidean distance between means of two sets of feature vectors P1 and P2.

pm4py.algo.concept_drift.variants.bose.detect_concept_drift(event_log, sub_log_size=50, window_size=8, num_permutations=100, thresh_p_value=0.5, max_no_change_points=5)[source]#

Detect concept drift in an event log using a permutation test over consecutive windows of sub-logs.

Parameters: - event_log: List of lists, where each inner list is a trace of activities. - sub_log_size: Number of traces per sub-log (default: 50). - window_size: Number of sub-logs in each window for comparison (default: 8). - num_permutations: Number of permutations for the statistical test (default: 100). - thresh_p_value: Threshold for the p-value - max_no_change_points: Maximum number of change points detected

Returns: - sub_logs: list of sub-logs, each with (up to) sub_log_size traces - change_points: list of sub-log indices and p-values where drift is detected