pm4py.statistics.process_cube.variants package#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
Submodules#
pm4py.statistics.process_cube.variants.classic module#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
- class pm4py.statistics.process_cube.variants.classic.Parameters(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
Enum- MAX_DIVISIONS_X = 'max_divisions_x'#
- MAX_DIVISIONS_Y = 'max_divisions_y'#
- AGGREGATION_FUNCTION = 'aggregation_function'#
- X_BINS = 'x_bins'#
- Y_BINS = 'y_bins'#
- pm4py.statistics.process_cube.variants.classic.apply(feature_table: DataFrame, x_col: str, y_col: str, agg_col: str, parameters: Dict[Any, Any] | None = None)[source]#
Constructs a process cube by slicing data along two dimensions (x_col, y_col) and aggregating a third (agg_col). Additionally:
If x_col (or y_col) is an actual column in df, we do numeric binning. You can manually specify bin edges via parameters[Parameters.X_BINS] (a list of numeric edges) or parameters[Parameters.Y_BINS]. Otherwise, we automatically divide into equal-width bins using parameters[Parameters.MAX_DIVISIONS_X] or MAX_DIVISIONS_Y.
If x_col (or y_col) is not present, we do prefix-based binning.
Parameters#
- feature_tablepd.DataFrame
A feature table that must contain ‘case:concept:name’ and agg_col, plus the columns for x_col, y_col (if numeric) or the columns that start with x_col, y_col (if prefix-based).
- x_colstr
The X dimension. If x_col in df.columns, numeric binning; else prefix-based.
- y_colstr
The Y dimension. If y_col in df.columns, numeric binning; else prefix-based.
- agg_colstr
The column to aggregate (mean, sum, etc.).
- parameters: Dict[Any, Any]
Optional parameters of the method, including: * Parameters.X_BINS: List of numeric bin edges for x_col. * Parameters.Y_BINS: List of numeric bin edges for y_col. * Parameters.MAX_DIVISIONS_X: If x_col is numeric and X_BINS not provided,
how many bins to divide it into.
Parameters.MAX_DIVISIONS_Y: If y_col is numeric and Y_BINS not provided, how many bins to divide it into.
Parameters.AGGREGATION_FUNCTION: The aggregation function, e.g., ‘mean’, ‘sum’, ‘min’, ‘max’.
Returns#
- pivot_dfpd.DataFrame
A pivoted DataFrame representing the process cube, with x bins as rows and y bins as columns, containing aggregated values of agg_col.
- cell_case_dictdict
A dictionary mapping (x_bin, y_bin) -> set of case IDs that fall in that cell.