pm4py.discovery.discover_log_skeleton#

pm4py.discovery.discover_log_skeleton(log: EventLog | DataFrame, noise_threshold: float = 0.0, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[str, Any][source]#

Discovers a Log Skeleton from an event log.

A Log Skeleton is a declarative model consisting of six different constraints: - directly_follows: Specifies strict bounds on activities that directly follow each other. Example: ‘A should be directly followed by B’ and ‘B should be directly followed by C’. - always_before: Specifies that some activities may only be executed if certain other activities have been executed earlier in the case. Example: ‘C should always be preceded by A’. - always_after: Specifies that certain activities should always trigger the execution of some other activities later in the case. Example: ‘A should always be followed by C’. - equivalence: Specifies that a given pair of activities should occur the same number of times within a case. Example: ‘B and C should always occur the same number of times’. - never_together: Specifies that a given pair of activities should never occur together in a case. Example: ‘There should be no case containing both C and D’. - activ_occurrences: Specifies allowed numbers of occurrences per activity. Example: ‘Activity A can occur 1 or 2 times, and Activity B can occur 1 to 4 times’.

Reference paper: Verbeek, H. M. W., and R. Medeiros de Carvalho. “Log skeletons: A classification approach to process discovery.” arXiv preprint arXiv:1806.08247 (2018).

Parameters:
  • log – Event log or Pandas DataFrame.

  • noise_threshold (float) – Noise threshold influencing the strictness of constraints (default: 0.0).

  • activity_key (str) – Attribute to be used for the activity (default: “concept:name”).

  • timestamp_key (str) – Attribute to be used for the timestamp (default: “time:timestamp”).

  • case_id_key (str) – Attribute to be used as case identifier (default: “case:concept:name”).

Returns:

A dictionary representing the Log Skeleton with various constraints.

Return type:

Dict[str, Any]

import pm4py

log_skeleton = pm4py.discover_log_skeleton(
    dataframe,
    noise_threshold=0.1,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)