pm4py.filtering module#

The pm4py.filtering module contains the filtering features offered in pm4py.

pm4py.filtering.filter_log_relative_occurrence_event_attribute(log: EventLog | DataFrame, min_relative_stake: float, attribute_key: str = 'concept:name', level: str = 'cases', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters the event log, keeping only the events that have an attribute value which occurs: - in at least the specified (min_relative_stake) percentage of events when level=”events”, - in at least the specified (min_relative_stake) percentage of cases when level=”cases”.

Parameters:

log – Event log or Pandas DataFrame.
min_relative_stake – Minimum percentage of cases (expressed as a number between 0 and 1) in which the attribute should occur.
attribute_key – The attribute to filter.
level – The level of the filter (if level=”events”, then events; if level=”cases”, then cases).
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_log_relative_occurrence_event_attribute(
    dataframe,
    0.5,
    attribute_key='concept:name',
    level='cases',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)

pm4py.filtering.filter_start_activities(log: EventLog | DataFrame, activities: Set[str] | List[str], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters cases that have a start activity in the provided list.

Parameters:

log – Event log or Pandas DataFrame.
activities – Collection of start activities.
retain – If True, retains the traces containing the given start activities; if False, drops the traces.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_start_activities(
    dataframe,
    ['Act. A'],
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)

pm4py.filtering.filter_end_activities(log: EventLog | DataFrame, activities: Set[str] | List[str], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters cases that have an end activity in the provided list.

Parameters:

log – Event log or Pandas DataFrame.
activities – Collection of end activities.
retain – If True, retains the traces containing the given end activities; if False, drops the traces.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_end_activities(
    dataframe,
    ['Act. Z'],
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)

pm4py.filtering.filter_event_attribute_values(log: EventLog | DataFrame, attribute_key: str, values: Set[str] | List[str], level: str = 'case', retain: bool = True, case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters a log object based on the values of a specified event attribute.

Parameters:

log – Event log or Pandas DataFrame.
attribute_key – Attribute to filter.
values – Admitted or forbidden values.
level – Specifies how the filter should be applied (‘case’ filters the cases where at least one occurrence happens; ‘event’ filters the events, potentially trimming the cases).
retain – Specifies if the values should be kept or removed.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_event_attribute_values(
    dataframe,
    'concept:name',
    ['Act. A', 'Act. Z'],
    case_id_key='case:concept:name'
)

pm4py.filtering.filter_trace_attribute_values(log: EventLog | DataFrame, attribute_key: str, values: Set[str] | List[str], retain: bool = True, case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters a log based on the values of a specified trace attribute.

Parameters:

log – Event log or Pandas DataFrame.
attribute_key – Attribute to filter.
values – Collection of values to filter.
retain – Boolean value indicating whether to keep or discard matching traces.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_trace_attribute_values(
    dataframe,
    'case:creator',
    ['Mike'],
    case_id_key='case:concept:name'
)

pm4py.filtering.filter_variants(log: EventLog | DataFrame, variants: Set[str] | List[str] | List[Tuple[str]], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters a log based on a specified set of variants.

Parameters:

log – Event log or Pandas DataFrame.
variants – Collection of variants to filter. A variant should be specified as a list of tuples of activity names, e.g., [(‘a’, ‘b’, ‘c’)].
retain – Boolean indicating whether to retain (if True) or remove (if False) traces conforming to the specified variants.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_variants(
    dataframe,
    [('Act. A', 'Act. B', 'Act. Z'), ('Act. A', 'Act. C', 'Act. Z')],
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)

pm4py.filtering.filter_directly_follows_relation(log: EventLog | DataFrame, relations: List[str], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Retains traces that contain any of the specified ‘directly follows’ relations. For example, if relations == [(‘a’,’b’),(‘a’,’c’)] and log [<a,b,c>,<a,c,b>,<a,d,b>], the resulting log will contain traces describing [<a,b,c>,<a,c,b>].

Parameters:

log – Event log or Pandas DataFrame.
relations – List of activity name pairs, representing allowed or forbidden paths.
retain – Boolean indicating whether the paths should be kept (if True) or removed (if False).
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_directly_follows_relation(
    dataframe,
    [('A', 'B'), ('A', 'C')],
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)

pm4py.filtering.filter_eventually_follows_relation(log: EventLog | DataFrame, relations: List[str], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Retains traces that contain any of the specified ‘eventually follows’ relations. For example, if relations == [(‘a’,’b’),(‘a’,’c’)] and log [<a,b,c>,<a,c,b>,<a,d,b>], the resulting log will contain traces describing [<a,b,c>,<a,c,b>,<a,d,b>].

Parameters:

log – Event log or Pandas DataFrame.
relations – List of activity name pairs, representing allowed or forbidden paths.
retain – Boolean indicating whether the paths should be kept (if True) or removed (if False).
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_eventually_follows_relation(
    dataframe,
    [('A', 'B'), ('A', 'C')],
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)

pm4py.filtering.filter_time_range(log: EventLog | DataFrame, dt1: str, dt2: str, mode: str = 'events', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters a log based on a time interval.

Parameters:

log – Event log or Pandas DataFrame.
dt1 – Left extreme of the interval.
dt2 – Right extreme of the interval.
mode – Modality of filtering (‘events’, ‘traces_contained’, ‘traces_intersecting’). - ‘events’: Any event that fits the time frame is retained. - ‘traces_contained’: Any trace completely contained in the timeframe is retained. - ‘traces_intersecting’: Any trace intersecting with the timeframe is retained.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe1 = pm4py.filter_time_range(
    dataframe,
    '2010-01-01 00:00:00',
    '2011-01-01 00:00:00',
    mode='traces_contained',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
filtered_dataframe2 = pm4py.filter_time_range(
    dataframe,
    '2010-01-01 00:00:00',
    '2011-01-01 00:00:00',
    mode='traces_intersecting',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)
filtered_dataframe3 = pm4py.filter_time_range(
    dataframe,
    '2010-01-01 00:00:00',
    '2011-01-01 00:00:00',
    mode='events',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)

pm4py.filtering.filter_between(log: EventLog | DataFrame, act1: str | List[str], act2: str | List[str], activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Finds all the sub-cases leading from an event with activity “act1” to an event with activity “act2” in the log, and returns a log containing only them.

Example:

Log A B C D E F A B E F C A B F C B C B E F C

act1 = B act2 = C

Returned sub-cases: B C (from the first case) B E F C (from the second case) B F C (from the third case) B C (from the third case) B E F C (from the third case)

Parameters:

log – Event log or Pandas DataFrame.
act1 – Source activity or collection of activities.
act2 – Target activity or collection of activities.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_between(
    dataframe,
    'A',
    'D',
    activity_key='concept:name',
    case_id_key='case:concept:name',
    timestamp_key='time:timestamp'
)

pm4py.filtering.filter_case_size(log: EventLog | DataFrame, min_size: int, max_size: int, case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters the event log, keeping cases that have a length (number of events) between min_size and max_size.

Parameters:

log – Event log or Pandas DataFrame.
min_size – Minimum allowed number of events.
max_size – Maximum allowed number of events.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_case_size(
    dataframe,
    5,
    10,
    case_id_key='case:concept:name'
)

pm4py.filtering.filter_case_performance(log: EventLog | DataFrame, min_performance: float, max_performance: float, timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters the event log, keeping cases that have a duration (the timestamp of the last event minus the timestamp of the first event) between min_performance and max_performance.

Parameters:

log – Event log or Pandas DataFrame.
min_performance – Minimum allowed case duration.
max_performance – Maximum allowed case duration.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_case_performance(
    dataframe,
    3600.0,
    86400.0,
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)

pm4py.filtering.filter_activities_rework(log: EventLog | DataFrame, activity: str, min_occurrences: int = 2, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters the event log, keeping cases where the specified activity occurs at least min_occurrences times.

Parameters:

log – Event log or Pandas DataFrame.
activity – Activity to consider.
min_occurrences – Minimum desired number of occurrences.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_activities_rework(
    dataframe,
    'Approve Order',
    2,
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)

pm4py.filtering.filter_paths_performance(log: EventLog | DataFrame, path: Tuple[str, str], min_performance: float, max_performance: float, keep: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters the event log based on the performance of specified paths.

If keep=True, retains cases having the specified path (tuple of 2 activities) with a duration between min_performance and max_performance.
If keep=False, discards cases having the specified path with a duration between min_performance and max_performance.

Parameters:

log – Event log or Pandas DataFrame.
path – Tuple of two activities (source_activity, target_activity).
min_performance – Minimum allowed performance of the path.
max_performance – Maximum allowed performance of the path.
keep – Boolean indicating whether to keep (if True) or discard (if False) the cases with the specified performance.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_paths_performance(
    dataframe,
    ('A', 'D'),
    3600.0,
    86400.0,
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)

pm4py.filtering.filter_variants_top_k(log: EventLog | DataFrame, k: int, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Keeps the top-k variants of the log.

Parameters:

log – Event log or Pandas DataFrame.
k – Number of variants to keep.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_variants_top_k(
    dataframe,
    5,
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)

pm4py.filtering.filter_variants_by_coverage_percentage(log: EventLog | DataFrame, min_coverage_percentage: float, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters the variants of the log based on a coverage percentage. For example, if min_coverage_percentage=0.4 and the log has 1000 cases with: - 500 cases of variant 1, - 400 cases of variant 2, - 100 cases of variant 3, the filter keeps only the traces of variant 1 and variant 2.

Parameters:

log – Event log or Pandas DataFrame.
min_coverage_percentage – Minimum allowed percentage of coverage.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_variants_by_coverage_percentage(
    dataframe,
    0.1,
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)

pm4py.filtering.filter_prefixes(log: EventLog | DataFrame, activity: str, strict: bool = True, first_or_last: str = 'first', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters the log, keeping the prefixes leading up to a given activity. For example, for a log with traces: - A,B,C,D - A,B,Z,A,B,C,D - A,B,C,D,C,E,C,F

The prefixes to “C” are respectively: - A,B - A,B,Z,A,B - A,B

Parameters:

log – Event log or Pandas DataFrame.
activity – Target activity for the filter.
strict – Applies the filter strictly, cutting the occurrences of the selected activity.
first_or_last – Decides if the first or last occurrence of an activity should be selected as the baseline for the filter.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_prefixes(
    dataframe,
    'Act. C',
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)

pm4py.filtering.filter_suffixes(log: EventLog | DataFrame, activity: str, strict: bool = True, first_or_last: str = 'first', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters the log, keeping the suffixes starting from a given activity. For example, for a log with traces: - A,B,C,D - A,B,Z,A,B,C,D - A,B,C,D,C,E,C,F

The suffixes from “C” are respectively: - D - D - D,C,E,C,F

Parameters:

log – Event log or Pandas DataFrame.
activity – Target activity for the filter.
strict – Applies the filter strictly, cutting the occurrences of the selected activity.
first_or_last – Decides if the first or last occurrence of an activity should be selected as the baseline for the filter.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_suffixes(
    dataframe,
    'Act. C',
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)

pm4py.filtering.filter_ocel_event_attribute(ocel: OCEL, attribute_key: str, attribute_values: Collection[Any], positive: bool = True) → OCEL[source]#

Filters the object-centric event log based on the provided event attribute values.

Parameters:

ocel – Object-centric event log.
attribute_key – Attribute at the event level to filter.
attribute_values – Collection of attribute values to keep or remove.
positive – Determines whether the values should be kept (True) or removed (False).

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_event_attribute(
    ocel,
    'ocel:activity',
    ['A', 'B', 'D']
)

pm4py.filtering.filter_ocel_object_attribute(ocel: OCEL, attribute_key: str, attribute_values: Collection[Any], positive: bool = True) → OCEL[source]#

Filters the object-centric event log based on the provided object attribute values.

Parameters:

ocel – Object-centric event log.
attribute_key – Attribute at the object level to filter.
attribute_values – Collection of attribute values to keep or remove.
positive – Determines whether the values should be kept (True) or removed (False).

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_object_attribute(
    ocel,
    'ocel:type',
    ['order']
)

pm4py.filtering.filter_ocel_object_types_allowed_activities(ocel: OCEL, correspondence_dict: Dict[str, Collection[str]]) → OCEL[source]#

Filters an object-centric event log, keeping only the specified object types with the specified set of allowed activities.

Parameters:

ocel – Object-centric event log.
correspondence_dict – Dictionary containing, for every object type of interest, a collection of allowed activities. Example: {“order”: [“Create Order”], “element”: [“Create Order”, “Create Delivery”]}.

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_object_types_allowed_activities(
    ocel,
    {'order': ['create order', 'pay order'], 'item': ['create item', 'deliver item']}
)

pm4py.filtering.filter_ocel_object_per_type_count(ocel: OCEL, min_num_obj_type: Dict[str, int]) → OCEL[source]#

Filters the events of the object-centric logs that are related to at least the specified number of objects per type.

Example: pm4py.filter_object_per_type_count(ocel, {“order”: 1, “element”: 2})

Would keep the following events:

ocel:eid ocel:timestamp ocel:activity ocel:type:element ocel:type:order

0 e1 1980-01-01 Create Order [i4, i1, i3, i2] [o1] 1 e11 1981-01-01 Create Order [i6, i5] [o2] 2 e14 1981-01-04 Create Order [i8, i7] [o3]

Parameters:

ocel – Object-centric event log.
min_num_obj_type – Minimum number of objects per type.

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_object_per_type_count(
    ocel,
    {'order': 1, 'element': 2}
)

pm4py.filtering.filter_ocel_start_events_per_object_type(ocel: OCEL, object_type: str) → OCEL[source]#

Filters the events in which a new object of the given object type is spawned. For example, an event with activity “Create Order” might spawn new orders.

Parameters:

ocel – Object-centric event log.
object_type – Object type to consider.

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_start_events_per_object_type(
    ocel,
    'delivery'
)

pm4py.filtering.filter_ocel_end_events_per_object_type(ocel: OCEL, object_type: str) → OCEL[source]#

Filters the events in which an object of the given object type terminates its lifecycle. For example, an event with activity “Pay Order” might terminate an order.

Parameters:

ocel – Object-centric event log.
object_type – Object type to consider.

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_end_events_per_object_type(
    ocel,
    'delivery'
)

pm4py.filtering.filter_ocel_events_timestamp(ocel: OCEL, min_timest: datetime | str, max_timest: datetime | str, timestamp_key: str = 'ocel:timestamp') → OCEL[source]#

Filters the object-centric event log, keeping events within the provided timestamp range.

Parameters:

ocel – Object-centric event log.
min_timest – Left extreme of the allowed timestamp interval (format: YYYY-mm-dd HH:MM:SS).
max_timest – Right extreme of the allowed timestamp interval (format: YYYY-mm-dd HH:MM:SS).
timestamp_key – The attribute to use as timestamp (default: ocel:timestamp).

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_events_timestamp(
    ocel,
    '1990-01-01 00:00:00',
    '2010-01-01 00:00:00'
)

pm4py.filtering.filter_four_eyes_principle(log: EventLog | DataFrame, activity1: str, activity2: str, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', resource_key: str = 'org:resource', keep_violations: bool = False) → EventLog | DataFrame[source]#

Filters out the cases of the log that violate the four-eyes principle on the provided activities.

Parameters:

log – Event log or Pandas DataFrame.
activity1 – First activity.
activity2 – Second activity.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
resource_key – Attribute to be used as resource.
keep_violations – Boolean indicating whether to discard (if False) or retain (if True) the violations.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_four_eyes_principle(
    dataframe,
    'Act. A',
    'Act. B',
    activity_key='concept:name',
    resource_key='org:resource',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)

pm4py.filtering.filter_activity_done_different_resources(log: EventLog | DataFrame, activity: str, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', resource_key: str = 'org:resource', keep_violations: bool = True) → EventLog | DataFrame[source]#

Filters the cases where an activity is performed by different resources multiple times.

Parameters:

log – Event log or Pandas DataFrame.
activity – Activity to consider.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
resource_key – Attribute to be used as resource.
keep_violations – Boolean indicating whether to discard (if False) or retain (if True) the violations.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

filtered_dataframe = pm4py.filter_activity_done_different_resources(
    dataframe,
    'Act. A',
    activity_key='concept:name',
    resource_key='org:resource',
    timestamp_key='time:timestamp',
    case_id_key='case:concept:name'
)

pm4py.filtering.filter_trace_segments(log: EventLog | DataFrame, admitted_traces: List[List[str]], positive: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') → EventLog | DataFrame[source]#

Filters an event log based on a set of trace segments. A trace is a sequence of activities and “…” where: - “…” before an activity indicates that other activities can precede the given activity. - “…” after an activity indicates that other activities can follow the given activity.

Examples: - pm4py.filter_trace_segments(log, [[“A”, “B”]]) retains only cases with the exact process variant A,B. - pm4py.filter_trace_segments(log, [[”…”, “A”, “B”]]) retains only cases ending with activities A,B. - pm4py.filter_trace_segments(log, [[“A”, “B”, “…”]]) retains only cases starting with activities A,B. - pm4py.filter_trace_segments(log, [[”…”, “A”, “B”, “C”, “…”], [”…”, “D”, “E”, “F”, “…”]]) retains cases where:

At any point, there is A followed by B followed by C,

And at any other point, there is D followed by E followed by F.

Parameters:

log – Event log or Pandas DataFrame.
admitted_traces – Collection of trace segments to admit based on the criteria above.
positive – Boolean indicating whether to keep (if True) or discard (if False) the cases satisfying the filter.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.

Returns:

Filtered event log or Pandas DataFrame.

import pm4py

log = pm4py.read_xes("tests/input_data/running-example.xes")

filtered_log = pm4py.filter_trace_segments(
    log,
    [["...", "check ticket", "decide", "reinitiate request", "..."]]
)
print(filtered_log)

pm4py.filtering.filter_ocel_object_types(ocel: OCEL, obj_types: Collection[str], positive: bool = True, level: int = 1) → OCEL[source]#

Filters the object types of an object-centric event log.

Parameters:

ocel – Object-centric event log.
obj_types – Object types to keep or remove.
positive – Boolean indicating whether to keep (True) or remove (False) the specified object types.
level – Recursively expands the set of object identifiers until the specified level.

Returns:

Filtered OCEL.

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_object_types(
    ocel,
    ['order']
)

pm4py.filtering.filter_ocel_objects(ocel: OCEL, object_identifiers: Collection[str], positive: bool = True, level: int = 1) → OCEL[source]#

Filters the object identifiers of an object-centric event log.

Parameters:

ocel – Object-centric event log.
object_identifiers – Object identifiers to keep or remove.
positive – Boolean indicating whether to keep (True) or remove (False) the specified object identifiers.
level – Recursively expands the set of object identifiers until the specified level.

Returns:

Filtered OCEL.

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_objects(
    ocel,
    ['o1'],
    level=1
)

pm4py.filtering.filter_ocel_events(ocel: OCEL, event_identifiers: Collection[str], positive: bool = True) → OCEL[source]#

Filters the event identifiers of an object-centric event log.

Parameters:

ocel – Object-centric event log.
event_identifiers – Event identifiers to keep or remove.
positive – Boolean indicating whether to keep (True) or remove (False) the specified event identifiers.

Returns:

Filtered OCEL.

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_events(
    ocel,
    ['e1']
)

pm4py.filtering.filter_ocel_activities_connected_object_type(ocel: OCEL, object_type: str) → OCEL[source]#

Filter an OCEL on the set of activities executed on objects of the given object type.

Parameters:

ocel – object-centric event log
object_type – object type

Return type:

OCEL

import pm4py

ocel = pm4py.read_ocel2("tests/input_data/ocel/ocel20_example.xmlocel")
filtered_ocel = pm4py.filter_ocel_activities_connected_object_type(ocel, "Purchase Order")
print(filtered_ocel)

pm4py.filtering.filter_ocel_cc_object(ocel: OCEL, object_id: str, conn_comp: List[List[str]] | None = None, return_conn_comp: bool = False) → OCEL | Tuple[OCEL, List[List[str]]][source]#

Returns the connected component of the object-centric event log to which the specified object belongs.

Parameters:

ocel – Object-centric event log.
object_id – Object identifier.
conn_comp – (Optional) Precomputed connected components of the OCEL objects.
return_conn_comp – If True, returns the filtered OCEL along with the computed connected components.

Returns:

Filtered OCEL, optionally with the list of connected components.

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_cc_object(
    ocel,
    'order1'
)

pm4py.filtering.filter_ocel_cc_length(ocel: OCEL, min_cc_length: int, max_cc_length: int) → OCEL[source]#

Keeps only the objects in an OCEL belonging to a connected component with a length falling within the specified range.

Reference: Adams, Jan Niklas, et al. “Defining cases and variants for object-centric event data.” 2022 4th International Conference on Process Mining (ICPM). IEEE, 2022.

Parameters:

ocel – Object-centric event log.
min_cc_length – Minimum allowed length for the connected component.
max_cc_length – Maximum allowed length for the connected component.

Returns:

Filtered OCEL.

import pm4py

filtered_ocel = pm4py.filter_ocel_cc_length(
    ocel,
    2,
    10
)

pm4py.filtering.filter_ocel_cc_otype(ocel: OCEL, otype: str, positive: bool = True) → OCEL[source]#

Filters the objects belonging to connected components that have at least one object of the specified type.

Reference: Adams, Jan Niklas, et al. “Defining cases and variants for object-centric event data.” 2022 4th International Conference on Process Mining (ICPM). IEEE, 2022.

Parameters:

ocel – Object-centric event log.
otype – Object type to consider.
positive – Boolean indicating whether to keep (True) or discard (False) the objects in these components.

Returns:

Filtered OCEL.

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_cc_otype(
    ocel,
    'order'
)

pm4py.filtering.filter_ocel_cc_activity(ocel: OCEL, activity: str) → OCEL[source]#

Filters the objects belonging to connected components that include at least one event with the specified activity.

Reference: Adams, Jan Niklas, et al. “Defining cases and variants for object-centric event data.” 2022 4th International Conference on Process Mining (ICPM). IEEE, 2022.

Parameters:

ocel – Object-centric event log.
activity – Activity to consider.

Returns:

Filtered OCEL.

import pm4py

ocel = pm4py.read_ocel('log.jsonocel')
filtered_ocel = pm4py.filter_ocel_cc_activity(
    ocel,
    'Create Order'
)

pm4py.filtering.filter_dfg_activities_percentage(dfg: Dict[Tuple[str, str], int], start_activities: Dict[str, int], end_activities: Dict[str, int], percentage: float = 0.2) → Tuple[Dict[Tuple[str, str], int], Dict[str, int], Dict[str, int]][source]#

Filters the DFG on the provided percentage of activities.

Parameters:

dfg – frequency directly-follows graph
start_activities – dictionary of the start activities
end_activities – dictionary of the end activities
percentage – percentage of activities to keep

import pm4py

log = pm4py.read_xes('tests/input_data/receipt.xes')
dfg, sa, ea = pm4py.discover_dfg(log)
dfg, sa, ea = pm4py.filter_dfg_activities_percentage(dfg, sa, ea, percentage=0.2)
pm4py.view_dfg(dfg, sa, ea, format='svg')

pm4py.filtering.filter_dfg_paths_percentage(dfg: Dict[Tuple[str, str], int], start_activities: Dict[str, int], end_activities: Dict[str, int], percentage: float = 0.2) → Tuple[Dict[Tuple[str, str], int], Dict[str, int], Dict[str, int]][source]#

Filters the DFG on the provided percentage of paths.

Parameters:

dfg – frequency directly-follows graph
start_activities – dictionary of the start activities
end_activities – dictionary of the end activities
percentage – percentage of paths to keep

import pm4py

log = pm4py.read_xes('tests/input_data/receipt.xes')
dfg, sa, ea = pm4py.discover_dfg(log)
dfg, sa, ea = pm4py.filter_dfg_paths_percentage(dfg, sa, ea, percentage=0.2)
pm4py.view_dfg(dfg, sa, ea, format='svg')

pm4py.filtering module#

This Page

PMTk

PM4Py

Company