pm4py.statistics.traces.generic.pandas package#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
Submodules#
pm4py.statistics.traces.generic.pandas.case_arrival module#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
- class pm4py.statistics.traces.generic.pandas.case_arrival.Parameters(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
Enum
- ATTRIBUTE_KEY = 'pm4py:param:attribute_key'#
- ACTIVITY_KEY = 'pm4py:param:activity_key'#
- START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'#
- TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
- CASE_ID_KEY = 'pm4py:param:case_id_key'#
- MAX_NO_POINTS_SAMPLE = 'max_no_of_points_to_sample'#
- KEEP_ONCE_PER_CASE = 'keep_once_per_case'#
- pm4py.statistics.traces.generic.pandas.case_arrival.get_case_arrival_avg(df: DataFrame, parameters: Dict[str | Parameters, Any] | None = None) float [source]#
Gets the average time interlapsed between case starts
Parameters#
- df
Pandas dataframe
- parameters
- Parameters of the algorithm, including:
Parameters.TIMESTAMP_KEY -> attribute of the log to be used as timestamp
Returns#
- case_arrival_avg
Average time interlapsed between case starts
- pm4py.statistics.traces.generic.pandas.case_arrival.get_case_dispersion_avg(df, parameters=None)[source]#
Gets the average time interlapsed between case ends
Parameters#
- df
Pandas dataframe
- parameters
- Parameters of the algorithm, including:
Parameters.TIMESTAMP_KEY -> attribute of the log to be used as timestamp
Returns#
- case_dispersion_avg
Average time interlapsed between the completion of cases
pm4py.statistics.traces.generic.pandas.case_statistics module#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
- class pm4py.statistics.traces.generic.pandas.case_statistics.Parameters(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
Enum
- ATTRIBUTE_KEY = 'pm4py:param:attribute_key'#
- ACTIVITY_KEY = 'pm4py:param:activity_key'#
- TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
- CASE_ID_KEY = 'pm4py:param:case_id_key'#
- START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'#
- MAX_VARIANTS_TO_RETURN = 'max_variants_to_return'#
- VARIANTS_DF = 'variants_df'#
- ENABLE_SORT = 'enable_sort'#
- SORT_BY_COLUMN = 'sort_by_column'#
- SORT_ASCENDING = 'sort_ascending'#
- MAX_RET_CASES = 'max_ret_cases'#
- BUSINESS_HOURS = 'business_hours'#
- BUSINESS_HOUR_SLOTS = 'business_hour_slots'#
- WORKCALENDAR = 'workcalendar'#
- pm4py.statistics.traces.generic.pandas.case_statistics.get_variant_statistics(df: DataFrame, parameters: Dict[str | Parameters, Any] | None = None) List[Dict[str, int]] | List[Dict[List[str], int]] [source]#
Get variants from a Pandas dataframe
Parameters#
- df
Dataframe
- parameters
- Parameters of the algorithm, including:
Parameters.CASE_ID_KEY -> Column that contains the Case ID Parameters.ACTIVITY_KEY -> Column that contains the activity Parameters.MAX_VARIANTS_TO_RETURN -> Maximum number of variants to return variants_df -> If provided, avoid recalculation of the variants dataframe
Returns#
- variants_list
List of variants inside the Pandas dataframe
- pm4py.statistics.traces.generic.pandas.case_statistics.get_variants_df_and_list(df: DataFrame, parameters: Dict[str | Parameters, Any] | None = None) Tuple[DataFrame, List[Dict[str, int]] | List[Dict[List[str], int]]] [source]#
(Technical method) Provides variants_df and variants_list out of the box
Parameters#
- df
Dataframe
- parameters
- Parameters of the algorithm, including:
Parameters.CASE_ID_KEY -> Column that contains the Case ID Parameters.ACTIVITY_KEY -> Column that contains the activity
Returns#
- variants_df
Variants dataframe
- variants_list
List of variants sorted by their count
- pm4py.statistics.traces.generic.pandas.case_statistics.get_cases_description(df: DataFrame, parameters: Dict[str | Parameters, Any] | None = None) Dict[str, Dict[str, Any]] [source]#
Get a description of traces present in the Pandas dataframe
Parameters#
- df
Pandas dataframe
- parameters
- Parameters of the algorithm, including:
Parameters.CASE_ID_KEY -> Column that identifies the case ID Parameters.TIMESTAMP_KEY -> Column that identifies the timestamp enable_sort -> Enable sorting of traces Parameters.SORT_BY_COLUMN -> Sort traces inside the dataframe using the specified column. Admitted values: startTime, endTime, caseDuration Parameters.SORT_ASCENDING -> Set sort direction (boolean; it true then the sort direction is ascending, otherwise descending) Parameters.MAX_RET_CASES -> Set the maximum number of returned traces
Returns#
- ret
Dictionary of traces associated to their start timestamp, their end timestamp and their duration
- pm4py.statistics.traces.generic.pandas.case_statistics.get_variants_df(df, parameters=None)[source]#
Get variants dataframe from a Pandas dataframe
Parameters#
- df
Dataframe
- parameters
- Parameters of the algorithm, including:
Parameters.CASE_ID_KEY -> Column that contains the Case ID Parameters.ACTIVITY_KEY -> Column that contains the activity
Returns#
- variants_df
Variants dataframe
- pm4py.statistics.traces.generic.pandas.case_statistics.get_variants_df_with_case_duration(df, parameters=None)[source]#
Get variants dataframe from a Pandas dataframe, with case duration that is included
Parameters#
- df
Dataframe
- parameters
- Parameters of the algorithm, including:
Parameters.CASE_ID_KEY -> Column that contains the Case ID Parameters.ACTIVITY_KEY -> Column that contains the activity Parameters.TIMESTAMP_KEY -> Column that contains the timestamp
Returns#
- variants_df
Variants dataframe
- pm4py.statistics.traces.generic.pandas.case_statistics.get_events(df: DataFrame, case_id: str, parameters: Dict[str | Parameters, Any] | None = None) List[Dict[str, Any]] [source]#
Get events belonging to the specified case
Parameters#
- df
Pandas dataframe
- case_id
Required case ID
- parameters
- Possible parameters of the algorithm, including:
Parameters.CASE_ID_KEY -> Column in which the case ID is contained
Returns#
- list_eve
List of events belonging to the case
- pm4py.statistics.traces.generic.pandas.case_statistics.get_kde_caseduration(df, parameters=None)[source]#
Gets the estimation of KDE density for the case durations calculated on the dataframe
Parameters#
- df
Pandas dataframe
- parameters
- Possible parameters of the algorithm, including:
Parameters.GRAPH_POINTS -> number of points to include in the graph Parameters.CASE_ID_KEY -> Column hosting the Case ID
Returns#
- x
X-axis values to represent
- y
Y-axis values to represent
- pm4py.statistics.traces.generic.pandas.case_statistics.get_kde_caseduration_json(df, parameters=None)[source]#
Gets the estimation of KDE density for the case durations calculated on the log/dataframe (expressed as JSON)
Parameters#
- df
Pandas dataframe
- parameters
- Possible parameters of the algorithm, including:
Parameters.GRAPH_POINTS -> number of points to include in the graph Parameters.CASE_ID_KEY -> Column hosting the Case ID
Returns#
- json
JSON representing the graph points
- pm4py.statistics.traces.generic.pandas.case_statistics.get_all_case_durations(df, parameters=None)[source]#
Gets all the case durations out of the log
Parameters#
- df
Pandas dataframe
- parameters
Possible parameters of the algorithm
Returns#
- duration_values
List of all duration values