pm4py.algo.filtering.pandas.cases package#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
Submodules#
pm4py.algo.filtering.pandas.cases.case_filter module#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
- class pm4py.algo.filtering.pandas.cases.case_filter.Parameters(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
Enum
- TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
- CASE_ID_KEY = 'pm4py:param:case_id_key'#
- BUSINESS_HOURS = 'business_hours'#
- BUSINESS_HOUR_SLOTS = 'business_hour_slots'#
- WORKCALENDAR = 'workcalendar'#
- pm4py.algo.filtering.pandas.cases.case_filter.filter_on_ncases(df: DataFrame, case_id_glue: str = 'case:concept:name', max_no_cases: int = 1000)[source]#
Filter a dataframe keeping only the specified maximum number of traces
Parameters#
- df
Dataframe
- case_id_glue
Case ID column in the CSV
- max_no_cases
Maximum number of traces to keep
Returns#
- df
Filtered dataframe
- pm4py.algo.filtering.pandas.cases.case_filter.filter_on_case_size(df0: DataFrame, case_id_glue: str = 'case:concept:name', min_case_size: int = 2, max_case_size=None)[source]#
Filter a dataframe keeping only traces with at least the specified number of events
Parameters#
- df
Dataframe
- case_id_glue
Case ID column in the CSV
- min_case_size
Minimum size of a case
- max_case_size
Maximum case size
Returns#
- df
Filtered dataframe
- pm4py.algo.filtering.pandas.cases.case_filter.filter_on_case_performance(df: DataFrame, case_id_glue: str = 'case:concept:name', timestamp_key: str = 'time:timestamp', min_case_performance: float = 0, max_case_performance: float = 10000000000, business_hours=False, business_hours_slots=[(25200, 61200), (111600, 147600), (198000, 234000), (284400, 320400), (370800, 406800)]) DataFrame [source]#
Filter a dataframe on case performance
Parameters#
- df
Dataframe
- case_id_glue
Case ID column in the CSV
- timestamp_key
Timestamp column to use for the CSV
- min_case_performance
Minimum case performance
- max_case_performance
Maximum case performance
Returns#
- df
Filtered dataframe
- pm4py.algo.filtering.pandas.cases.case_filter.filter_case_performance(df: DataFrame, min_case_performance: float = 0, max_case_performance: float = 10000000000, parameters: Dict[str | Parameters, Any] | None = None) DataFrame [source]#