pm4py.filtering module#
The pm4py.filtering module contains the filtering features offered in pm4py.
- pm4py.filtering.filter_log_relative_occurrence_event_attribute(log: EventLog | DataFrame, min_relative_stake: float, attribute_key: str = 'concept:name', level: str = 'cases', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters the event log, keeping only the events that have an attribute value which occurs: - in at least the specified (min_relative_stake) percentage of events when level=”events”, - in at least the specified (min_relative_stake) percentage of cases when level=”cases”.
- Parameters:
log – Event log or Pandas DataFrame.
min_relative_stake – Minimum percentage of cases (expressed as a number between 0 and 1) in which the attribute should occur.
attribute_key – The attribute to filter.
level – The level of the filter (if level=”events”, then events; if level=”cases”, then cases).
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_log_relative_occurrence_event_attribute( dataframe, 0.5, attribute_key='concept:name', level='cases', case_id_key='case:concept:name', timestamp_key='time:timestamp' )
- pm4py.filtering.filter_start_activities(log: EventLog | DataFrame, activities: Set[str] | List[str], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters cases that have a start activity in the provided list.
- Parameters:
log – Event log or Pandas DataFrame.
activities – Collection of start activities.
retain – If True, retains the traces containing the given start activities; if False, drops the traces.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_start_activities( dataframe, ['Act. A'], activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp' )
- pm4py.filtering.filter_end_activities(log: EventLog | DataFrame, activities: Set[str] | List[str], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters cases that have an end activity in the provided list.
- Parameters:
log – Event log or Pandas DataFrame.
activities – Collection of end activities.
retain – If True, retains the traces containing the given end activities; if False, drops the traces.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_end_activities( dataframe, ['Act. Z'], activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp' )
- pm4py.filtering.filter_event_attribute_values(log: EventLog | DataFrame, attribute_key: str, values: Set[str] | List[str], level: str = 'case', retain: bool = True, case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters a log object based on the values of a specified event attribute.
- Parameters:
log – Event log or Pandas DataFrame.
attribute_key – Attribute to filter.
values – Admitted or forbidden values.
level – Specifies how the filter should be applied (‘case’ filters the cases where at least one occurrence happens; ‘event’ filters the events, potentially trimming the cases).
retain – Specifies if the values should be kept or removed.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_event_attribute_values( dataframe, 'concept:name', ['Act. A', 'Act. Z'], case_id_key='case:concept:name' )
- pm4py.filtering.filter_trace_attribute_values(log: EventLog | DataFrame, attribute_key: str, values: Set[str] | List[str], retain: bool = True, case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters a log based on the values of a specified trace attribute.
- Parameters:
log – Event log or Pandas DataFrame.
attribute_key – Attribute to filter.
values – Collection of values to filter.
retain – Boolean value indicating whether to keep or discard matching traces.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_trace_attribute_values( dataframe, 'case:creator', ['Mike'], case_id_key='case:concept:name' )
- pm4py.filtering.filter_variants(log: EventLog | DataFrame, variants: Set[str] | List[str] | List[Tuple[str]], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters a log based on a specified set of variants.
- Parameters:
log – Event log or Pandas DataFrame.
variants – Collection of variants to filter. A variant should be specified as a list of tuples of activity names, e.g., [(‘a’, ‘b’, ‘c’)].
retain – Boolean indicating whether to retain (if True) or remove (if False) traces conforming to the specified variants.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_variants( dataframe, [('Act. A', 'Act. B', 'Act. Z'), ('Act. A', 'Act. C', 'Act. Z')], activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp' )
- pm4py.filtering.filter_directly_follows_relation(log: EventLog | DataFrame, relations: List[str], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Retains traces that contain any of the specified ‘directly follows’ relations. For example, if relations == [(‘a’,’b’),(‘a’,’c’)] and log [<a,b,c>,<a,c,b>,<a,d,b>], the resulting log will contain traces describing [<a,b,c>,<a,c,b>].
- Parameters:
log – Event log or Pandas DataFrame.
relations – List of activity name pairs, representing allowed or forbidden paths.
retain – Boolean indicating whether the paths should be kept (if True) or removed (if False).
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_directly_follows_relation( dataframe, [('A', 'B'), ('A', 'C')], activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp' )
- pm4py.filtering.filter_eventually_follows_relation(log: EventLog | DataFrame, relations: List[str], retain: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Retains traces that contain any of the specified ‘eventually follows’ relations. For example, if relations == [(‘a’,’b’),(‘a’,’c’)] and log [<a,b,c>,<a,c,b>,<a,d,b>], the resulting log will contain traces describing [<a,b,c>,<a,c,b>,<a,d,b>].
- Parameters:
log – Event log or Pandas DataFrame.
relations – List of activity name pairs, representing allowed or forbidden paths.
retain – Boolean indicating whether the paths should be kept (if True) or removed (if False).
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_eventually_follows_relation( dataframe, [('A', 'B'), ('A', 'C')], activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp' )
- pm4py.filtering.filter_time_range(log: EventLog | DataFrame, dt1: str, dt2: str, mode: str = 'events', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters a log based on a time interval.
- Parameters:
log – Event log or Pandas DataFrame.
dt1 – Left extreme of the interval.
dt2 – Right extreme of the interval.
mode – Modality of filtering (‘events’, ‘traces_contained’, ‘traces_intersecting’). - ‘events’: Any event that fits the time frame is retained. - ‘traces_contained’: Any trace completely contained in the timeframe is retained. - ‘traces_intersecting’: Any trace intersecting with the timeframe is retained.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe1 = pm4py.filter_time_range( dataframe, '2010-01-01 00:00:00', '2011-01-01 00:00:00', mode='traces_contained', case_id_key='case:concept:name', timestamp_key='time:timestamp' ) filtered_dataframe2 = pm4py.filter_time_range( dataframe, '2010-01-01 00:00:00', '2011-01-01 00:00:00', mode='traces_intersecting', case_id_key='case:concept:name', timestamp_key='time:timestamp' ) filtered_dataframe3 = pm4py.filter_time_range( dataframe, '2010-01-01 00:00:00', '2011-01-01 00:00:00', mode='events', case_id_key='case:concept:name', timestamp_key='time:timestamp' )
- pm4py.filtering.filter_between(log: EventLog | DataFrame, act1: str | List[str], act2: str | List[str], activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Finds all the sub-cases leading from an event with activity “act1” to an event with activity “act2” in the log, and returns a log containing only them.
Example:
Log A B C D E F A B E F C A B F C B C B E F C
act1 = B act2 = C
Returned sub-cases: B C (from the first case) B E F C (from the second case) B F C (from the third case) B C (from the third case) B E F C (from the third case)
- Parameters:
log – Event log or Pandas DataFrame.
act1 – Source activity or collection of activities.
act2 – Target activity or collection of activities.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_between( dataframe, 'A', 'D', activity_key='concept:name', case_id_key='case:concept:name', timestamp_key='time:timestamp' )
- pm4py.filtering.filter_case_size(log: EventLog | DataFrame, min_size: int, max_size: int, case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters the event log, keeping cases that have a length (number of events) between min_size and max_size.
- Parameters:
log – Event log or Pandas DataFrame.
min_size – Minimum allowed number of events.
max_size – Maximum allowed number of events.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_case_size( dataframe, 5, 10, case_id_key='case:concept:name' )
- pm4py.filtering.filter_case_performance(log: EventLog | DataFrame, min_performance: float, max_performance: float, timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters the event log, keeping cases that have a duration (the timestamp of the last event minus the timestamp of the first event) between min_performance and max_performance.
- Parameters:
log – Event log or Pandas DataFrame.
min_performance – Minimum allowed case duration.
max_performance – Maximum allowed case duration.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_case_performance( dataframe, 3600.0, 86400.0, timestamp_key='time:timestamp', case_id_key='case:concept:name' )
- pm4py.filtering.filter_activities_rework(log: EventLog | DataFrame, activity: str, min_occurrences: int = 2, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters the event log, keeping cases where the specified activity occurs at least min_occurrences times.
- Parameters:
log – Event log or Pandas DataFrame.
activity – Activity to consider.
min_occurrences – Minimum desired number of occurrences.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_activities_rework( dataframe, 'Approve Order', 2, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name' )
- pm4py.filtering.filter_paths_performance(log: EventLog | DataFrame, path: Tuple[str, str], min_performance: float, max_performance: float, keep: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters the event log based on the performance of specified paths.
If keep=True, retains cases having the specified path (tuple of 2 activities) with a duration between min_performance and max_performance.
If keep=False, discards cases having the specified path with a duration between min_performance and max_performance.
- Parameters:
log – Event log or Pandas DataFrame.
path – Tuple of two activities (source_activity, target_activity).
min_performance – Minimum allowed performance of the path.
max_performance – Maximum allowed performance of the path.
keep – Boolean indicating whether to keep (if True) or discard (if False) the cases with the specified performance.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_paths_performance( dataframe, ('A', 'D'), 3600.0, 86400.0, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name' )
- pm4py.filtering.filter_variants_top_k(log: EventLog | DataFrame, k: int, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Keeps the top-k variants of the log.
- Parameters:
log – Event log or Pandas DataFrame.
k – Number of variants to keep.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_variants_top_k( dataframe, 5, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name' )
- pm4py.filtering.filter_variants_by_coverage_percentage(log: EventLog | DataFrame, min_coverage_percentage: float, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters the variants of the log based on a coverage percentage. For example, if min_coverage_percentage=0.4 and the log has 1000 cases with: - 500 cases of variant 1, - 400 cases of variant 2, - 100 cases of variant 3, the filter keeps only the traces of variant 1 and variant 2.
- Parameters:
log – Event log or Pandas DataFrame.
min_coverage_percentage – Minimum allowed percentage of coverage.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_variants_by_coverage_percentage( dataframe, 0.1, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name' )
- pm4py.filtering.filter_prefixes(log: EventLog | DataFrame, activity: str, strict: bool = True, first_or_last: str = 'first', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters the log, keeping the prefixes leading up to a given activity. For example, for a log with traces: - A,B,C,D - A,B,Z,A,B,C,D - A,B,C,D,C,E,C,F
The prefixes to “C” are respectively: - A,B - A,B,Z,A,B - A,B
- Parameters:
log – Event log or Pandas DataFrame.
activity – Target activity for the filter.
strict – Applies the filter strictly, cutting the occurrences of the selected activity.
first_or_last – Decides if the first or last occurrence of an activity should be selected as the baseline for the filter.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_prefixes( dataframe, 'Act. C', activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name' )
- pm4py.filtering.filter_suffixes(log: EventLog | DataFrame, activity: str, strict: bool = True, first_or_last: str = 'first', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters the log, keeping the suffixes starting from a given activity. For example, for a log with traces: - A,B,C,D - A,B,Z,A,B,C,D - A,B,C,D,C,E,C,F
The suffixes from “C” are respectively: - D - D - D,C,E,C,F
- Parameters:
log – Event log or Pandas DataFrame.
activity – Target activity for the filter.
strict – Applies the filter strictly, cutting the occurrences of the selected activity.
first_or_last – Decides if the first or last occurrence of an activity should be selected as the baseline for the filter.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_suffixes( dataframe, 'Act. C', activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name' )
- pm4py.filtering.filter_ocel_event_attribute(ocel: OCEL, attribute_key: str, attribute_values: Collection[Any], positive: bool = True) OCEL[source]#
Filters the object-centric event log based on the provided event attribute values.
- Parameters:
ocel – Object-centric event log.
attribute_key – Attribute at the event level to filter.
attribute_values – Collection of attribute values to keep or remove.
positive – Determines whether the values should be kept (True) or removed (False).
- Returns:
Filtered OCEL.
import pm4py filtered_ocel = pm4py.filter_ocel_event_attribute( ocel, 'ocel:activity', ['A', 'B', 'D'] )
- pm4py.filtering.filter_ocel_object_attribute(ocel: OCEL, attribute_key: str, attribute_values: Collection[Any], positive: bool = True) OCEL[source]#
Filters the object-centric event log based on the provided object attribute values.
- Parameters:
ocel – Object-centric event log.
attribute_key – Attribute at the object level to filter.
attribute_values – Collection of attribute values to keep or remove.
positive – Determines whether the values should be kept (True) or removed (False).
- Returns:
Filtered OCEL.
import pm4py filtered_ocel = pm4py.filter_ocel_object_attribute( ocel, 'ocel:type', ['order'] )
- pm4py.filtering.filter_ocel_object_types_allowed_activities(ocel: OCEL, correspondence_dict: Dict[str, Collection[str]]) OCEL[source]#
Filters an object-centric event log, keeping only the specified object types with the specified set of allowed activities.
- Parameters:
ocel – Object-centric event log.
correspondence_dict – Dictionary containing, for every object type of interest, a collection of allowed activities. Example: {“order”: [“Create Order”], “element”: [“Create Order”, “Create Delivery”]}.
- Returns:
Filtered OCEL.
import pm4py filtered_ocel = pm4py.filter_ocel_object_types_allowed_activities( ocel, {'order': ['create order', 'pay order'], 'item': ['create item', 'deliver item']} )
- pm4py.filtering.filter_ocel_object_per_type_count(ocel: OCEL, min_num_obj_type: Dict[str, int]) OCEL[source]#
Filters the events of the object-centric logs that are related to at least the specified number of objects per type.
Example: pm4py.filter_object_per_type_count(ocel, {“order”: 1, “element”: 2})
Would keep the following events:
ocel:eid ocel:timestamp ocel:activity ocel:type:element ocel:type:order
0 e1 1980-01-01 Create Order [i4, i1, i3, i2] [o1] 1 e11 1981-01-01 Create Order [i6, i5] [o2] 2 e14 1981-01-04 Create Order [i8, i7] [o3]
- Parameters:
ocel – Object-centric event log.
min_num_obj_type – Minimum number of objects per type.
- Returns:
Filtered OCEL.
import pm4py filtered_ocel = pm4py.filter_ocel_object_per_type_count( ocel, {'order': 1, 'element': 2} )
- pm4py.filtering.filter_ocel_start_events_per_object_type(ocel: OCEL, object_type: str) OCEL[source]#
Filters the events in which a new object of the given object type is spawned. For example, an event with activity “Create Order” might spawn new orders.
- Parameters:
ocel – Object-centric event log.
object_type – Object type to consider.
- Returns:
Filtered OCEL.
import pm4py filtered_ocel = pm4py.filter_ocel_start_events_per_object_type( ocel, 'delivery' )
- pm4py.filtering.filter_ocel_end_events_per_object_type(ocel: OCEL, object_type: str) OCEL[source]#
Filters the events in which an object of the given object type terminates its lifecycle. For example, an event with activity “Pay Order” might terminate an order.
- Parameters:
ocel – Object-centric event log.
object_type – Object type to consider.
- Returns:
Filtered OCEL.
import pm4py filtered_ocel = pm4py.filter_ocel_end_events_per_object_type( ocel, 'delivery' )
- pm4py.filtering.filter_ocel_events_timestamp(ocel: OCEL, min_timest: datetime | str, max_timest: datetime | str, timestamp_key: str = 'ocel:timestamp') OCEL[source]#
Filters the object-centric event log, keeping events within the provided timestamp range.
- Parameters:
ocel – Object-centric event log.
min_timest – Left extreme of the allowed timestamp interval (format: YYYY-mm-dd HH:MM:SS).
max_timest – Right extreme of the allowed timestamp interval (format: YYYY-mm-dd HH:MM:SS).
timestamp_key – The attribute to use as timestamp (default: ocel:timestamp).
- Returns:
Filtered OCEL.
import pm4py filtered_ocel = pm4py.filter_ocel_events_timestamp( ocel, '1990-01-01 00:00:00', '2010-01-01 00:00:00' )
- pm4py.filtering.filter_four_eyes_principle(log: EventLog | DataFrame, activity1: str, activity2: str, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', resource_key: str = 'org:resource', keep_violations: bool = False) EventLog | DataFrame[source]#
Filters out the cases of the log that violate the four-eyes principle on the provided activities.
- Parameters:
log – Event log or Pandas DataFrame.
activity1 – First activity.
activity2 – Second activity.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
resource_key – Attribute to be used as resource.
keep_violations – Boolean indicating whether to discard (if False) or retain (if True) the violations.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_four_eyes_principle( dataframe, 'Act. A', 'Act. B', activity_key='concept:name', resource_key='org:resource', timestamp_key='time:timestamp', case_id_key='case:concept:name' )
- pm4py.filtering.filter_activity_done_different_resources(log: EventLog | DataFrame, activity: str, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name', resource_key: str = 'org:resource', keep_violations: bool = True) EventLog | DataFrame[source]#
Filters the cases where an activity is performed by different resources multiple times.
- Parameters:
log – Event log or Pandas DataFrame.
activity – Activity to consider.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
resource_key – Attribute to be used as resource.
keep_violations – Boolean indicating whether to discard (if False) or retain (if True) the violations.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py filtered_dataframe = pm4py.filter_activity_done_different_resources( dataframe, 'Act. A', activity_key='concept:name', resource_key='org:resource', timestamp_key='time:timestamp', case_id_key='case:concept:name' )
- pm4py.filtering.filter_trace_segments(log: EventLog | DataFrame, admitted_traces: List[List[str]], positive: bool = True, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame[source]#
Filters an event log based on a set of trace segments. A trace is a sequence of activities and “…” where: - “…” before an activity indicates that other activities can precede the given activity. - “…” after an activity indicates that other activities can follow the given activity.
Examples: - pm4py.filter_trace_segments(log, [[“A”, “B”]]) retains only cases with the exact process variant A,B. - pm4py.filter_trace_segments(log, [[”…”, “A”, “B”]]) retains only cases ending with activities A,B. - pm4py.filter_trace_segments(log, [[“A”, “B”, “…”]]) retains only cases starting with activities A,B. - pm4py.filter_trace_segments(log, [[”…”, “A”, “B”, “C”, “…”], [”…”, “D”, “E”, “F”, “…”]]) retains cases where:
At any point, there is A followed by B followed by C,
And at any other point, there is D followed by E followed by F.
- Parameters:
log – Event log or Pandas DataFrame.
admitted_traces – Collection of trace segments to admit based on the criteria above.
positive – Boolean indicating whether to keep (if True) or discard (if False) the cases satisfying the filter.
activity_key – Attribute to be used for the activity.
timestamp_key – Attribute to be used for the timestamp.
case_id_key – Attribute to be used as case identifier.
- Returns:
Filtered event log or Pandas DataFrame.
import pm4py log = pm4py.read_xes("tests/input_data/running-example.xes") filtered_log = pm4py.filter_trace_segments( log, [["...", "check ticket", "decide", "reinitiate request", "..."]] ) print(filtered_log)
- pm4py.filtering.filter_ocel_object_types(ocel: OCEL, obj_types: Collection[str], positive: bool = True, level: int = 1) OCEL[source]#
Filters the object types of an object-centric event log.
- Parameters:
ocel – Object-centric event log.
obj_types – Object types to keep or remove.
positive – Boolean indicating whether to keep (True) or remove (False) the specified object types.
level – Recursively expands the set of object identifiers until the specified level.
- Returns:
Filtered OCEL.
import pm4py ocel = pm4py.read_ocel('log.jsonocel') filtered_ocel = pm4py.filter_ocel_object_types( ocel, ['order'] )
- pm4py.filtering.filter_ocel_objects(ocel: OCEL, object_identifiers: Collection[str], positive: bool = True, level: int = 1) OCEL[source]#
Filters the object identifiers of an object-centric event log.
- Parameters:
ocel – Object-centric event log.
object_identifiers – Object identifiers to keep or remove.
positive – Boolean indicating whether to keep (True) or remove (False) the specified object identifiers.
level – Recursively expands the set of object identifiers until the specified level.
- Returns:
Filtered OCEL.
import pm4py ocel = pm4py.read_ocel('log.jsonocel') filtered_ocel = pm4py.filter_ocel_objects( ocel, ['o1'], level=1 )
- pm4py.filtering.filter_ocel_events(ocel: OCEL, event_identifiers: Collection[str], positive: bool = True) OCEL[source]#
Filters the event identifiers of an object-centric event log.
- Parameters:
ocel – Object-centric event log.
event_identifiers – Event identifiers to keep or remove.
positive – Boolean indicating whether to keep (True) or remove (False) the specified event identifiers.
- Returns:
Filtered OCEL.
import pm4py ocel = pm4py.read_ocel('log.jsonocel') filtered_ocel = pm4py.filter_ocel_events( ocel, ['e1'] )
- pm4py.filtering.filter_ocel_activities_connected_object_type(ocel: OCEL, object_type: str) OCEL[source]#
Filter an OCEL on the set of activities executed on objects of the given object type.
- Parameters:
ocel – object-centric event log
object_type – object type
- Return type:
OCEL
import pm4py ocel = pm4py.read_ocel2("tests/input_data/ocel/ocel20_example.xmlocel") filtered_ocel = pm4py.filter_ocel_activities_connected_object_type(ocel, "Purchase Order") print(filtered_ocel)
- pm4py.filtering.filter_ocel_cc_object(ocel: OCEL, object_id: str, conn_comp: List[List[str]] | None = None, return_conn_comp: bool = False) OCEL | Tuple[OCEL, List[List[str]]][source]#
Returns the connected component of the object-centric event log to which the specified object belongs.
- Parameters:
ocel – Object-centric event log.
object_id – Object identifier.
conn_comp – (Optional) Precomputed connected components of the OCEL objects.
return_conn_comp – If True, returns the filtered OCEL along with the computed connected components.
- Returns:
Filtered OCEL, optionally with the list of connected components.
import pm4py ocel = pm4py.read_ocel('log.jsonocel') filtered_ocel = pm4py.filter_ocel_cc_object( ocel, 'order1' )
- pm4py.filtering.filter_ocel_cc_length(ocel: OCEL, min_cc_length: int, max_cc_length: int) OCEL[source]#
Keeps only the objects in an OCEL belonging to a connected component with a length falling within the specified range.
Reference: Adams, Jan Niklas, et al. “Defining cases and variants for object-centric event data.” 2022 4th International Conference on Process Mining (ICPM). IEEE, 2022.
- Parameters:
ocel – Object-centric event log.
min_cc_length – Minimum allowed length for the connected component.
max_cc_length – Maximum allowed length for the connected component.
- Returns:
Filtered OCEL.
import pm4py filtered_ocel = pm4py.filter_ocel_cc_length( ocel, 2, 10 )
- pm4py.filtering.filter_ocel_cc_otype(ocel: OCEL, otype: str, positive: bool = True) OCEL[source]#
Filters the objects belonging to connected components that have at least one object of the specified type.
Reference: Adams, Jan Niklas, et al. “Defining cases and variants for object-centric event data.” 2022 4th International Conference on Process Mining (ICPM). IEEE, 2022.
- Parameters:
ocel – Object-centric event log.
otype – Object type to consider.
positive – Boolean indicating whether to keep (True) or discard (False) the objects in these components.
- Returns:
Filtered OCEL.
import pm4py ocel = pm4py.read_ocel('log.jsonocel') filtered_ocel = pm4py.filter_ocel_cc_otype( ocel, 'order' )
- pm4py.filtering.filter_ocel_cc_activity(ocel: OCEL, activity: str) OCEL[source]#
Filters the objects belonging to connected components that include at least one event with the specified activity.
Reference: Adams, Jan Niklas, et al. “Defining cases and variants for object-centric event data.” 2022 4th International Conference on Process Mining (ICPM). IEEE, 2022.
- Parameters:
ocel – Object-centric event log.
activity – Activity to consider.
- Returns:
Filtered OCEL.
import pm4py ocel = pm4py.read_ocel('log.jsonocel') filtered_ocel = pm4py.filter_ocel_cc_activity( ocel, 'Create Order' )
- pm4py.filtering.filter_dfg_activities_percentage(dfg: Dict[Tuple[str, str], int], start_activities: Dict[str, int], end_activities: Dict[str, int], percentage: float = 0.2) Tuple[Dict[Tuple[str, str], int], Dict[str, int], Dict[str, int]][source]#
Filters the DFG on the provided percentage of activities.
- Parameters:
dfg – frequency directly-follows graph
start_activities – dictionary of the start activities
end_activities – dictionary of the end activities
percentage – percentage of activities to keep
import pm4py log = pm4py.read_xes('tests/input_data/receipt.xes') dfg, sa, ea = pm4py.discover_dfg(log) dfg, sa, ea = pm4py.filter_dfg_activities_percentage(dfg, sa, ea, percentage=0.2) pm4py.view_dfg(dfg, sa, ea, format='svg')
- pm4py.filtering.filter_dfg_paths_percentage(dfg: Dict[Tuple[str, str], int], start_activities: Dict[str, int], end_activities: Dict[str, int], percentage: float = 0.2) Tuple[Dict[Tuple[str, str], int], Dict[str, int], Dict[str, int]][source]#
Filters the DFG on the provided percentage of paths.
- Parameters:
dfg – frequency directly-follows graph
start_activities – dictionary of the start activities
end_activities – dictionary of the end activities
percentage – percentage of paths to keep
import pm4py log = pm4py.read_xes('tests/input_data/receipt.xes') dfg, sa, ea = pm4py.discover_dfg(log) dfg, sa, ea = pm4py.filter_dfg_paths_percentage(dfg, sa, ea, percentage=0.2) pm4py.view_dfg(dfg, sa, ea, format='svg')