pm4py.algo.filtering.pandas.variants package#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

Submodules#

pm4py.algo.filtering.pandas.variants.variants_filter module#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

class pm4py.algo.filtering.pandas.variants.variants_filter.Parameters(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

CASE_ID_KEY = 'pm4py:param:case_id_key'#
ACTIVITY_KEY = 'pm4py:param:activity_key'#
DECREASING_FACTOR = 'decreasingFactor'#
POSITIVE = 'positive'#
pm4py.algo.filtering.pandas.variants.variants_filter.apply(df: DataFrame, admitted_variants: List[List[str]], parameters: Dict[str | Parameters, Any] | None = None) DataFrame[source]#

Apply a filter on variants

Parameters#

df

Dataframe

admitted_variants

List of admitted variants (to include/exclude)

parameters
Parameters of the algorithm, including:

Parameters.CASE_ID_KEY -> Column that contains the Case ID Parameters.ACTIVITY_KEY -> Column that contains the activity Parameters.POSITIVE -> Specifies if the filter should be applied including traces (positive=True) or excluding traces (positive=False) variants_df -> If provided, avoid recalculation of the variants dataframe

Returns#

df

Filtered dataframe

pm4py.algo.filtering.pandas.variants.variants_filter.filter_variants_top_k(log, k, parameters=None)[source]#

Keeps the top-k variants of the log

Parameters#

log

Event log

k

Number of variants that should be kept

parameters

Parameters

Returns#

filtered_log

Filtered log

pm4py.algo.filtering.pandas.variants.variants_filter.filter_variants_by_coverage_percentage(log, min_coverage_percentage, parameters=None)[source]#

Filters the variants of the log by a coverage percentage (e.g., if min_coverage_percentage=0.4, and we have a log with 1000 cases, of which 500 of the variant 1, 400 of the variant 2, and 100 of the variant 3, the filter keeps only the traces of variant 1 and variant 2).

Parameters#

log

Event log

min_coverage_percentage

Minimum allowed percentage of coverage

parameters

Parameters

Returns#

filtered_log

Filtered log

pm4py.algo.filtering.pandas.variants.variants_filter.filter_variants_by_maximum_coverage_percentage(log, max_coverage_percentage, parameters=None)[source]#

Filters the variants of the log by a maximum coverage percentage (e.g., if max_coverage_percentage=0.4, and we have a log with 1000 cases, of which 500 of the variant 1, 400 of the variant 2, and 100 of the variant 3, the filter keeps only the traces of variant w and variant 3).

Parameters#

log

Event log

max_coverage_percentage

Maximum allowed percentage of coverage

parameters

Parameters

Returns#

filtered_log

Filtered log