pm4py.discovery.discover_batches#
- pm4py.discovery.discover_batches(log: EventLog | DataFrame, merge_distance: int = 900, min_batch_size: int = 2, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', resource_key: str = 'org:resource') List[Tuple[Tuple[str, str], int, Dict[str, Any]]] [source]#
Discovers batches from the provided log.
An activity is executed in batches by a given resource when the resource performs the same activity multiple times in a short period. Identifying such activities may highlight repetitive tasks that could be automated.
The following batch categories are detected: - Simultaneous: All events in the batch have identical start and end timestamps. - Batching at Start: All events in the batch have identical start timestamps. - Batching at End: All events in the batch have identical end timestamps. - Sequential Batching: Consecutive events have the end of the first equal to the start of the second. - Concurrent Batching: Consecutive events that do not match sequentially.
Reference paper: Martin, N., Swennen, M., Depaire, B., Jans, M., Caris, A., & Vanhoof, K. (2015, December). Batch Processing: Definition and Event Log Identification. In SIMPDA (pp. 137-140).
- Parameters:
log – Event log or Pandas DataFrame.
merge_distance (
int
) – Maximum time distance (in seconds) between non-overlapping intervals to consider them part of the same batch (default: 900 seconds, i.e., 15 minutes).min_batch_size (
int
) – Minimum number of events required to form a batch (default: 2).activity_key (
str
) – Attribute to be used for the activity (default: “concept:name”).timestamp_key (
str
) – Attribute to be used for the timestamp (default: “time:timestamp”).case_id_key (
str
) – Attribute to be used as case identifier (default: “case:concept:name”).resource_key (
str
) – Attribute to be used as resource (default: “org:resource”).
- Returns:
A sorted list of tuples, each containing: - The (activity, resource) pair. - The number of batches for the given activity-resource. - A dictionary with batch details.
- Return type:
List[Tuple[Tuple[str, str], int, Dict[str, Any]]]
import pm4py batches = pm4py.discover_batches( dataframe, activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp', resource_key='org:resource' )