pm4py.statistics.chaotic_activities.variants.niek_sidorova module#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

class pm4py.statistics.chaotic_activities.variants.niek_sidorova.Parameters(*values)[source]#

Bases: Enum

ACTIVITY_KEY = 'pm4py:param:activity_key'#
ALPHA = 'alpha'#
pm4py.statistics.chaotic_activities.variants.niek_sidorova.apply(log: DataFrame | EventLog, parameters: Dict[Any, Any] | None = None) List[Dict[str, Any]][source]#

Compute information–theoretic metrics used to detect chaotic activities in an event log, as defined in:

Tax, Niek, Natalia Sidorova, and Wil MP van der Aalst. “Discovering more precise process models from event logs by filtering out chaotic activities.” Journal of Intelligent Information Systems 52.1 (2019): 107-139.

The result maps each activity to:

  • freq – absolute frequency #(a,L)

  • entropy – H(a,L) (direct entropy)

  • entropy_smooth – Hₛ(a,L) (Laplace‑smoothed entropy)

  • entropy_gain – ΔH (drop in total log‑entropy if a is removed)

  • chaotic_score – simple aggregate = (entropy_smooth+entropy_gain)/2

Parameters:
  • log – Event log or Pandas dataframe

  • parameters – Variant-specific parameters, including:

    • Parameters.ALPHA: Laplace/Lidstone smoothing parameter α. None reproduces the raw entropy H(a,L); a typical choice following the paper is α = 1/|A|.

    • Parameters.ACTIVITY_KEY: the attribute to be used as activity. Default: “concept:name”

Returns:

List of dictionaries, each representing an activity, sorted decreasingly based on the chaotic score.

Return type:

chaotic_activities

pm4py.statistics.chaotic_activities.variants.niek_sidorova.chaotic_metrics(traces, alpha=None)[source]#
Parameters:
  • traces (list[list[str]]) – The event log where each inner list is a trace (ordered events).

  • alpha (float | None) – Laplace/Lidstone smoothing parameter α. None reproduces the raw entropy H(a,L); a typical choice following the paper is α = 1/|A|.

Return type:

dict[str, dict] (activity → metrics)

pm4py.statistics.chaotic_activities.variants.niek_sidorova.total_entropy(traces, alpha=None)[source]#

Return Σₐ H(a,L) or Σₐ Hₛ(a,L).