pm4py.util.compression package#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

Submodules#

pm4py.util.compression.dtypes module#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

pm4py.util.compression.util module#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

pm4py.util.compression.util.project_univariate(log: EventLog | DataFrame, key: str = 'concept:name', df_glue: str = 'case:concept:name', df_sorting_criterion_key='time:timestamp') List[List[Any]] | None[source]#

Projects an event log to a univariate list of values For example, an event log of the form [[(‘concept:name’:A,’k1’:v1,’k2’:v2),(‘concept:name’:B,’k1’:v3,’k2’:v4),…],…] is converted to [[‘A’,’B’,…],…]

The method returns the compressed log

Return type:

UCL

Parameters:
  • log – log to compress (either EventLog or Dataframe)

  • key (str) – key to use for compression

  • df_glue (str) – key to use for combining events into traces when the input is a dataframe.

  • df_sorting_criterion_key (str) – key to use as a sorting criterion for traces (typically timestamps)

pm4py.util.compression.util.compress_univariate(log: EventLog | DataFrame, key: str = 'concept:name', df_glue: str = 'case:concept:name', df_sorting_criterion_key='time:timestamp') Tuple[List[List[Any]], List[Any]] | None[source]#

Compresses an event log to a univariate list of integer lists For example, an event log of the form [[(‘concept:name’:A,’k1’:v1,’k2’:v2),(‘concept:name’:B,’k1’:v3,’k2’:v4),…],…] is converted to [[0,1,…],…] with corresponding lookup table [‘A’, ‘B’], i.e., if the ‘concept:name’ column is used for comperssion.

The method returns a tuple containing the compressed log and the lookup table

Return type:

Tuple[UCL,ULT]

Parameters:
  • log – log to compress (either EventLog or Dataframe)

  • key (str) – key to use for compression

  • df_glue (str) – key to use for combining events into traces when the input is a dataframe.

  • df_sorting_criterion_key (str) – key to use as a sorting criterion for traces (typically timestamps)

pm4py.util.compression.util.compress_multivariate(log: EventLog | DataFrame, keys: List[str] = ['concept:name'], df_glue: str = 'case:concept:name', df_sorting_criterion_key: str = 'time:timestamp', uncompressed: List[str] = []) Tuple[List[List[Tuple[Any]]], List[List[Any]]][source]#

Compresses an event log to a list of lists containing tupes of integers. For example, an event log of the form [[(‘concept:name’:A,’k1’:v1,’k2’:v2),(‘concept:name’:B,’k1’:v3,’k2’:v4),…],…] is converted to [[(0,0),(1,1),…],…] with corresponding lookup table [‘A’, ‘B’], i.e., if the ‘concept:name’ and ‘k1’ columns are used for comperssion. The 2nd order criterion is used to sort the values that have the same trace attribute. The uncompressed arguments will be included, yet, not compressed (e.g., a boolean value needs not to be compressed)

The method returns a tuple containing the compressed log and the lookup table. The order of the data in the compressed log follows the ordering of the provided keys. First the compressed columns are stored, secondly the uncompressed columns

Return type:

Tuple[MCL,MLT]

Parameters:
  • log – log to compress (either EventLog or Dataframe)

  • keys – keys to use for compression

  • df_glue (str) – key to use for combining events into traces when the input is a dataframe.

  • df_sorting_criterion_key (str) – key to use as a sorting criterion for traces (typically timestamps)

  • uncompressed – columns that need to be included in the compression yet need not to be compressed

pm4py.util.compression.util.discover_dfg(log: List[List[Any]] | List[List[Tuple[Any]]], index: int = 0) DirectlyFollowsGraph[source]#

Discover a DFG object from a compressed event log (either univariate or multivariate) The DFG object represents a counter of integer pairs

Return type:

Counter[Tuple[int, int]]

Parameters:
  • log – compressed event log (either uni or multivariate)

  • indes – index to use for dfg discovery in case of using an multivariate log

pm4py.util.compression.util.discover_dfg_uvcl(log: Counter[Tuple[Any]]) DirectlyFollowsGraph[source]#
pm4py.util.compression.util.get_start_activities(log: List[List[Any]] | List[List[Tuple[Any]]] | Counter[Tuple[Any]], index: int = 0) Counter[Any][source]#
pm4py.util.compression.util.get_end_activities(log: List[List[Any]] | List[List[Tuple[Any]]] | Counter[Tuple[Any]], index: int = 0) Counter[Any][source]#
pm4py.util.compression.util.get_alphabet(log: List[List[Any]] | List[List[Tuple[Any]]] | Counter[Tuple[Any]], index: int = 0)[source]#
pm4py.util.compression.util.get_variants(log: List[List[Any]] | List[List[Tuple[Any]]], index: int = 0) Counter[Tuple[Any]][source]#
pm4py.util.compression.util.msd(ucl: List[List[Any]] | Counter[Tuple[Any]]) Dict[Any, int][source]#
pm4py.util.compression.util.msdw(cl: List[List[Any]] | Counter[Tuple[Any]], msd: Dict[Any, int]) Dict[Any, Any][source]#