pm4py.algo.filtering.pandas.attributes package#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

Submodules#

pm4py.algo.filtering.pandas.attributes.attributes_filter module#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

class pm4py.algo.filtering.pandas.attributes.attributes_filter.Parameters(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

ATTRIBUTE_KEY = 'pm4py:param:attribute_key'#
ACTIVITY_KEY = 'pm4py:param:activity_key'#
CASE_ID_KEY = 'pm4py:param:case_id_key'#
DECREASING_FACTOR = 'decreasingFactor'#
POSITIVE = 'positive'#
STREAM_FILTER_KEY1 = 'stream_filter_key1'#
STREAM_FILTER_VALUE1 = 'stream_filter_value1'#
STREAM_FILTER_KEY2 = 'stream_filter_key2'#
STREAM_FILTER_VALUE2 = 'stream_filter_value2'#
KEEP_ONCE_PER_CASE = 'keep_once_per_case'#
pm4py.algo.filtering.pandas.attributes.attributes_filter.apply_numeric_events(df: DataFrame, int1: float, int2: float, parameters: Dict[str | Parameters, Any] | None = None) DataFrame[source]#

Apply a filter on events (numerical filter)

Parameters#

df

Dataframe

int1

Lower bound of the interval

int2

Upper bound of the interval

parameters
Possible parameters of the algorithm:

Parameters.ATTRIBUTE_KEY => indicates which attribute to filter positive => keep or remove events?

Returns#

filtered_df

Filtered dataframe

pm4py.algo.filtering.pandas.attributes.attributes_filter.apply_numeric(df: DataFrame, int1: float, int2: float, parameters: Dict[str | Parameters, Any] | None = None) DataFrame[source]#

Filter dataframe on attribute values (filter cases)

Parameters#

df

Dataframe

int1

Lower bound of the interval

int2

Upper bound of the interval

parameters
Possible parameters of the algorithm:

Parameters.ATTRIBUTE_KEY => indicates which attribute to filter Parameters.POSITIVE => keep or remove traces with such events?

Returns#

filtered_df

Filtered dataframe

pm4py.algo.filtering.pandas.attributes.attributes_filter.apply_events(df: DataFrame, values: List[str], parameters: Dict[str | Parameters, Any] | None = None) DataFrame[source]#

Filter dataframe on attribute values (filter events)

Parameters#

df

Dataframe

values

Values to filter on

parameters
Possible parameters of the algorithm, including:

Parameters.ATTRIBUTE_KEY -> Attribute we want to filter Parameters.POSITIVE -> Specifies if the filter should be applied including traces (positive=True) or excluding traces (positive=False)

Returns#

df

Filtered dataframe

pm4py.algo.filtering.pandas.attributes.attributes_filter.apply(df: DataFrame, values: List[str], parameters: Dict[str | Parameters, Any] | None = None) DataFrame[source]#

Filter dataframe on attribute values (filter traces)

Parameters#

df

Dataframe

values

Values to filter on

parameters
Possible parameters of the algorithm, including:

Parameters.CASE_ID_KEY -> Case ID column in the dataframe Parameters.ATTRIBUTE_KEY -> Attribute we want to filter Parameters.POSITIVE -> Specifies if the filter should be applied including traces (positive=True) or excluding traces (positive=False)

Returns#

df

Filtered dataframe

pm4py.algo.filtering.pandas.attributes.attributes_filter.filter_df_on_attribute_values(df, values, case_id_glue='case:concept:name', attribute_key='concept:name', positive=True)[source]#

Filter dataframe on attribute values

Parameters#

df

Dataframe

values

Values to filter on

case_id_glue

Case ID column in the dataframe

attribute_key

Attribute we want to filter

positive

Specifies if the filtered should be applied including traces (positive=True) or excluding traces (positive=False)

Returns#

df

Filtered dataframe

pm4py.algo.filtering.pandas.attributes.attributes_filter.filter_df_keeping_activ_exc_thresh(df, thresh, act_count0=None, activity_key='concept:name', most_common_variant=None)[source]#

Filter a dataframe keeping activities exceeding the threshold

Parameters#

df

Pandas dataframe

thresh

Threshold to use to cut activities

act_count0

(If provided) Dictionary that associates each activity with its count

activity_key

Column in which the activity is present

Returns#

df

Filtered dataframe

pm4py.algo.filtering.pandas.attributes.attributes_filter.filter_df_keeping_spno_activities(df: DataFrame, activity_key: str = 'concept:name', max_no_activities: int = 25)[source]#

Filter a dataframe on the specified number of attributes

Parameters#

df

Dataframe

activity_key

Activity key in dataframe (must be specified if different from concept:name)

max_no_activities

Maximum allowed number of attributes

Returns#

df

Filtered dataframe

pm4py.algo.filtering.pandas.attributes.attributes_filter.filter_df_relative_occurrence_event_attribute(df: DataFrame, min_relative_stake: float, parameters: Dict[Any, Any] | None = None) DataFrame[source]#

Filters the event log keeping only the events having an attribute value which occurs: - in at least the specified (min_relative_stake) percentage of events, when Parameters.KEEP_ONCE_PER_CASE = False - in at least the specified (min_relative_stake) percentage of cases, when Parameters.KEEP_ONCE_PER_CASE = True

Parameters#

df

Pandas dataframe

min_relative_stake

Minimum percentage of cases (expressed as a number between 0 and 1) in which the attribute should occur.

parameters

Parameters of the algorithm, including: - Parameters.ATTRIBUTE_KEY => the attribute to use (default: concept:name) - Parameters.KEEP_ONCE_PER_CASE => decides the level of the filter to apply (if the filter should be applied on the cases, set it to True).

Returns#

filtered_df

Filtered Pandas dataframe