pm4py.stats.get_minimum_self_distances#

pm4py.stats.get_minimum_self_distances(log: EventLog | DataFrame, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') Dict[str, int][source]#

Computes the minimum self-distance for each activity observed in an event log.

The self-distance of an activity a in a trace is defined as follows: - In a trace <a>, it’s infinity. - In a trace <a, a>, it’s 0. - In a trace <a, b, a>, it’s 1. - And so on.

The minimum self-distance for an activity is the smallest self-distance observed across all traces.

Parameters:
  • log – Event log (EventLog or pandas DataFrame).

  • activity_key (str) – Attribute to be used for the activity.

  • timestamp_key (str) – Attribute to be used for the timestamp.

  • case_id_key (str) – Attribute to be used as the case identifier.

Returns:

A dictionary mapping each activity to its minimum self-distance.

import pm4py

msd = pm4py.get_minimum_self_distances(
    dataframe,
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)