pm4py.algo.transformation.log_to_features.variants package#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
Submodules#
pm4py.algo.transformation.log_to_features.variants.event_based module#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
- class pm4py.algo.transformation.log_to_features.variants.event_based.Parameters(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
Enum
- STR_EVENT_ATTRIBUTES = 'str_ev_attr'#
- NUM_EVENT_ATTRIBUTES = 'num_ev_attr'#
- FEATURE_NAMES = 'feature_names'#
- MIN_NUM_DIFF_STR_VALUES = 'min_num_diff_str_values'#
- MAX_NUM_DIFF_STR_VALUES = 'max_num_diff_str_values'#
- pm4py.algo.transformation.log_to_features.variants.event_based.extract_all_ev_features_names_from_log(log: EventLog, str_ev_attr: List[str], num_ev_attr: List[str], parameters: Dict[str | Parameters, Any] | None = None) List[str] [source]#
Extracts the feature names from an event log.
Parameters#
- log
Event log
- str_ev_attr
(if provided) list of string event attributes to consider in extracting the feature names
- num_ev_attr
(if provided) list of integer event attributes to consider in extracting the feature names
- parameters
- Parameters, including:
MIN_NUM_DIFF_STR_VALUES => minimum number of distinct values to include an attribute as feature(s)
MAX_NUM_DIFF_STR_VALUES => maximum number of distinct values to include an attribute as feature(s)
Returns#
- feature_names
List of feature names
- pm4py.algo.transformation.log_to_features.variants.event_based.extract_features(log: EventLog, feature_names: List[str], parameters: Dict[str | Parameters, Any] | None = None) Tuple[Any, List[str]] [source]#
Extracts the matrix of the features from an event log
Parameters#
- log
Event log
- feature_names
Features to consider (in the given order)
Returns#
- data
Data to provide for decision tree learning
- feature_names
Names of the features, in order
- pm4py.algo.transformation.log_to_features.variants.event_based.apply(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None) Tuple[Any, List[str]] [source]#
Extracts all the features for the traces of an event log (each trace becomes a vector of vectors, where each event has its own vector)
Parameters#
- log
Event log
- parameters
- Parameters of the algorithm, including:
STR_EVENT_ATTRIBUTES => string event attributes to consider in the features extraction
NUM_EVENT_ATTRIBUTES => numeric event attributes to consider in the features extraction
FEATURE_NAMES => features to consider (in the given order)
Returns#
- data
Data to provide for decision tree learning
- feature_names
Names of the features, in order
pm4py.algo.transformation.log_to_features.variants.temporal module#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
- class pm4py.algo.transformation.log_to_features.variants.temporal.Parameters(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
Enum
- ARRIVAL_RATE = 'arrival_rate'#
- FINISH_RATE = 'finish_rate'#
- CASE_ID_COLUMN = 'pm4py:param:case_id_key'#
- START_TIMESTAMP_COLUMN = 'pm4py:param:start_timestamp_key'#
- TIMESTAMP_COLUMN = 'pm4py:param:timestamp_key'#
- RESOURCE_COLUMN = 'pm4py:param:resource_key'#
- ACTIVITY_COLUMN = 'pm4py:param:activity_key'#
- GROUPER_FREQ = 'grouper_freq'#
- SERVICE_TIME = 'service_time'#
- WAITING_TIME = 'waiting_time'#
- SOJOURN_TIME = 'sojourn_time'#
- DIFF_START_END = 'diff_start_end'#
- pm4py.algo.transformation.log_to_features.variants.temporal.apply(log: EventLog | EventStream | DataFrame, parameters: Dict[Any, Any] | None = None) DataFrame [source]#
Extracts temporal features with the provided granularity from the Pandas dataframe.
Implements the approach described in the paper: Pourbafrani, Mahsa, Sebastiaan J. van Zelst, and Wil MP van der Aalst. “Supporting automatic system dynamics model generation for simulation in the context of process mining.” International Conference on Business Information Systems. Springer, Cham, 2020.
Parameters#
- log
Event log / Event stream / Pandas dataframe
- parameters
Parameters of the algorithm, including: - Parameters.GROUPER_FREQ => the time interval to be used for the grouping - Parameters.ARRIVAL_RATE => column of the dataframe which is going to host the arrival rate - Parameters.FINISH_RATE => column of the dataframe which is going to host the finishing rate - Parameters.SERVICE_TIME => column of the dataframe which is going to host the service time - Parameters.WAITING_TIME => column of the dataframe which is going to host the waiting time - Parameters.SOJOURN_TIME => column of the dataframe which is going to host the sojourn time - Parameters.CASE_ID_COLUMN => case ID column in the dataframe (default: case:concept:name) - Parameters.ACTIVITY_COLUMN => activity column in the dataframe (default: concept:name) - Parameters.TIMESTAMP_COLUMN => timestamp column in the dataframe (default: time:timestamp) - Parameters.RESOURCE_COLUMN => resource column in the dataframe (default: org:resource) - Parameters.START_TIMESTAMP_COLUMN => start timestamp column in the dataframe (if not provided, the timestamp column is used)
Returns#
- features_df
Dataframe with temporal features
pm4py.algo.transformation.log_to_features.variants.trace_based module#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
- class pm4py.algo.transformation.log_to_features.variants.trace_based.Parameters(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
Enum
- ENABLE_ACTIVITY_DEF_REPRESENTATION = 'enable_activity_def_representation'#
- ENABLE_SUCC_DEF_REPRESENTATION = 'enable_succ_def_representation'#
- STR_TRACE_ATTRIBUTES = 'str_tr_attr'#
- STR_EVENT_ATTRIBUTES = 'str_ev_attr'#
- NUM_TRACE_ATTRIBUTES = 'num_tr_attr'#
- NUM_EVENT_ATTRIBUTES = 'num_ev_attr'#
- STR_EVSUCC_ATTRIBUTES = 'str_evsucc_attr'#
- FEATURE_NAMES = 'feature_names'#
- ACTIVITY_KEY = 'pm4py:param:activity_key'#
- START_TIMESTAMP_KEY = 'pm4py:param:start_timestamp_key'#
- TIMESTAMP_KEY = 'pm4py:param:timestamp_key'#
- CASE_ID_KEY = 'pm4py:param:case_id_key'#
- RESOURCE_KEY = 'pm4py:param:resource_key'#
- EPSILON = 'epsilon'#
- DEFAULT_NOT_PRESENT = 'default_not_present'#
- ENABLE_ALL_EXTRA_FEATURES = 'enable_all_extra_features'#
- ENABLE_CASE_DURATION = 'enable_case_duration'#
- ADD_CASE_IDENTIFIER_COLUMN = 'add_case_identifier_column'#
- ENABLE_TIMES_FROM_FIRST_OCCURRENCE = 'enable_times_from_first_occurrence'#
- ENABLE_TIMES_FROM_LAST_OCCURRENCE = 'enable_times_from_last_occurrence'#
- ENABLE_DIRECT_PATHS_TIMES_LAST_OCC = 'enable_direct_paths_times_last_occ'#
- ENABLE_INDIRECT_PATHS_TIMES_LAST_OCC = 'enable_indirect_paths_times_last_occ'#
- ENABLE_WORK_IN_PROGRESS = 'enable_work_in_progress'#
- ENABLE_RESOURCE_WORKLOAD = 'enable_resource_workload'#
- ENABLE_FIRST_LAST_ACTIVITY_INDEX = 'enable_first_last_activity_index'#
- ENABLE_MAX_CONCURRENT_EVENTS = 'enable_max_concurrent_events'#
- ENABLE_MAX_CONCURRENT_EVENTS_PER_ACTIVITY = 'enable_max_concurrent_events_per_activity'#
- CASE_ATTRIBUTE_PREFIX = 'case:'#
- pm4py.algo.transformation.log_to_features.variants.trace_based.max_concurrent_events(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None) Tuple[Any, List[str]] [source]#
Counts for every trace the maximum number of events (of any activity) that happen concurrently (e.g., their time intervals [st1, ct1] and [st2, ct2] have non-empty intersection).
Parameters#
- log
Event log
- parameters
Parameters of the algorithm
Returns#
- data
Numeric value of the features
- feature_names
Names of the features
- pm4py.algo.transformation.log_to_features.variants.trace_based.max_concurrent_events_per_activity(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None) Tuple[Any, List[str]] [source]#
Counts for every trace and every activity the maximum number of events of the given activity that happen concurrently (e.g., their time intervals [st1, ct1] and [st2, ct2] have non-empty intersection).
Parameters#
- log
Event log
- parameters
Parameters of the algorithm
Returns#
- data
Numeric value of the features
- feature_names
Names of the features
- pm4py.algo.transformation.log_to_features.variants.trace_based.resource_workload(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None) Tuple[Any, List[str]] [source]#
Calculates for each case, and for each resource of the log, the workload of the resource during the lead time of a case. Defaults if a resource is not contained in a case.
Parameters#
- log
Event log
- parameters
Parameters of the algorithm
Returns#
- data
Numeric value of the features
- feature_names
Names of the features
- pm4py.algo.transformation.log_to_features.variants.trace_based.work_in_progress(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None) Tuple[Any, List[str]] [source]#
Calculates for each case, and for each resource of the log, the number of cases which are open during the lead time of the case.
Parameters#
- log
Event log
- parameters
Parameters of the algorithm
Returns#
- data
Numeric value of the features
- feature_names
Names of the features
- pm4py.algo.transformation.log_to_features.variants.trace_based.indirect_paths_times_last_occ(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None) Tuple[Any, List[str]] [source]#
Calculates for each case, and for each indirect path of the case, the difference between the start timestamp of the later event and the completion timestamp of the first event. Defaults if a path is not present in a case.
Parameters#
- log
Event log
- parameters
Parameters of the algorithm
Returns#
- data
Numeric value of the features
- feature_names
Names of the features
- pm4py.algo.transformation.log_to_features.variants.trace_based.direct_paths_times_last_occ(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None) Tuple[Any, List[str]] [source]#
Calculates for each case, and for each direct path of the case, the difference between the start timestamp of the later event and the completion timestamp of the first event. Defaults if a path is not present in a case.
Parameters#
- log
Event log
- parameters
Parameters of the algorithm
Returns#
- data
Numeric value of the features
- feature_names
Names of the features
- pm4py.algo.transformation.log_to_features.variants.trace_based.times_from_first_occurrence_activity_case(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None) Tuple[Any, List[str]] [source]#
Calculates for each case, and for each activity, the times from the start to the case, and to the end of the case, from the first occurrence of the activity in the case.
Parameters#
- log
Event log
- parameters
Parameters of the algorithm
Returns#
- data
Numeric value of the features
- feature_names
Names of the features
- pm4py.algo.transformation.log_to_features.variants.trace_based.times_from_last_occurrence_activity_case(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None) Tuple[Any, List[str]] [source]#
Calculates for each case, and for each activity, the times from the start to the case, and to the end of the case, from the last occurrence of the activity in the case.
Parameters#
- log
Event log
- parameters
Parameters of the algorithm
Returns#
- data
Numeric value of the features
- feature_names
Names of the features
- pm4py.algo.transformation.log_to_features.variants.trace_based.first_last_activity_index_trace(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None) Tuple[Any, List[str]] [source]#
Consider as features the first and the last index of an activity inside a case
Parameters#
- log
Event log
- parameters
Parameters, including: - Parameters.ACTIVITY_KEY => the attribute to use as activity - Parameters.DEFAULT_NOT_PRESENT => the replacement value for activities that are not present for the specific case
Returns#
- data
Numeric value of the features
- feature_names
Names of the features
- pm4py.algo.transformation.log_to_features.variants.trace_based.case_duration(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None) Tuple[Any, List[str]] [source]#
Calculates for each case, the case duration (and adds it as a feature)
Parameters#
- log
Event log
- parameters
Parameters of the algorithm
Returns#
- data
Numeric value of the features
- feature_names
Names of the features
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_string_trace_attribute_rep(trace: Trace, trace_attribute: str) str [source]#
Get a representation of the feature name associated to a string trace attribute value
Parameters#
- trace
Trace of the log
- trace_attribute
Attribute of the trace to consider
Returns#
- rep
Representation of the feature name associated to a string trace attribute value
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_all_string_trace_attribute_values(log: EventLog, trace_attribute: str) List[str] [source]#
Get all string trace attribute values representations for a log
Parameters#
- log
Trace log
- trace_attribute
Attribute of the trace to consider
Returns#
- list
List containing for each trace a representation of the feature name associated to the attribute
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_string_event_attribute_rep(event: Event, event_attribute: str) str [source]#
Get a representation of the feature name associated to a string event attribute value
Parameters#
- event
Single event of a trace
- event_attribute
Event attribute to consider
Returns#
- rep
Representation of the feature name associated to a string event attribute value
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_values_event_attribute_for_trace(trace: Trace, event_attribute: str) Set[str] [source]#
Get all the representations for the events of a trace associated to a string event attribute values
Parameters#
- trace
Trace of the log
- event_attribute
Event attribute to consider
Returns#
- values
All feature names present for the given attribute in the given trace
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_all_string_event_attribute_values(log: EventLog, event_attribute: str) List[str] [source]#
Get all the representations for all the traces of the log associated to a string event attribute values
Parameters#
- log
Trace of the log
- event_attribute
Event attribute to consider
Returns#
- values
All feature names present for the given attribute in the given log
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_string_event_attribute_succession_rep(event1: Event, event2: Event, event_attribute: str) str [source]#
Get a representation of the feature name associated to a string event attribute value
Parameters#
- event1
First event of the succession
- event2
Second event of the succession
- event_attribute
Event attribute to consider
Returns#
- rep
Representation of the feature name associated to a string event attribute value
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_values_event_attribute_succession_for_trace(trace: Trace, event_attribute: str) Set[str] [source]#
Get all the representations for the events of a trace associated to a string event attribute succession values
Parameters#
- trace
Trace of the log
- event_attribute
Event attribute to consider
Returns#
- values
All feature names present for the given attribute succession in the given trace
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_all_string_event_succession_attribute_values(log: EventLog, event_attribute: str) List[str] [source]#
Get all the representations for all the traces of the log associated to a string event attribute succession values
Parameters#
- log
Trace of the log
- event_attribute
Event attribute to consider
Returns#
- values
All feature names present for the given attribute succession in the given log
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_numeric_trace_attribute_rep(trace_attribute: str) str [source]#
Get the feature name associated to a numeric trace attribute
Parameters#
- trace_attribute
Name of the trace attribute
Returns#
- feature_name
Name of the feature
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_numeric_trace_attribute_value(trace: Trace, trace_attribute: str) int | float [source]#
Get the value of a numeric trace attribute from a given trace
Parameters#
- trace
Trace of the log
Returns#
- value
Value of the numeric trace attribute for the given trace
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_numeric_event_attribute_rep(event_attribute: str) str [source]#
Get the feature name associated to a numeric event attribute
Parameters#
- event_attribute
Name of the event attribute
Returns#
- feature_name
Name of the feature
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_numeric_event_attribute_value(event: Event, event_attribute: str) int | float [source]#
Get the value of a numeric event attribute from a given event
Parameters#
- event
Event
Returns#
- value
Value of the numeric event attribute for the given event
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_numeric_event_attribute_value_trace(trace: Trace, event_attribute: str) int | float [source]#
Get the value of the last occurrence of a numeric event attribute given a trace
Parameters#
- trace
Trace of the log
Returns#
- value
Value of the last occurrence of a numeric trace attribute for the given trace
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_default_representation_with_attribute_names(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None, feature_names: List[str] | None = None) Tuple[Any, List[str], List[str], List[str], List[str], List[str]] [source]#
Gets the default data representation of an event log (for process tree building) returning also the attribute names
Parameters#
- log
Trace log
- parameters
Possible parameters of the algorithm
- feature_names
(If provided) Feature to use in the representation of the log
Returns#
- data
Data to provide for decision tree learning
- feature_names
Names of the features, in order
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_default_representation(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None, feature_names: List[str] | None = None) Tuple[Any, List[str]] [source]#
Gets the default data representation of an event log (for process tree building)
Parameters#
- log
Trace log
- parameters
Possible parameters of the algorithm
- feature_names
(If provided) Feature to use in the representation of the log
Returns#
- data
Data to provide for decision tree learning
- feature_names
Names of the features, in order
- pm4py.algo.transformation.log_to_features.variants.trace_based.get_representation(log: EventLog, str_tr_attr: List[str], str_ev_attr: List[str], num_tr_attr: List[str], num_ev_attr: List[str], str_evsucc_attr: List[str] | None = None, feature_names: List[str] | None = None) Tuple[Any, List[str]] [source]#
Get a representation of the event log that is suited for the data part of the decision tree learning
NOTE: this function only encodes the last value seen for each attribute
Parameters#
- log
Trace log
- str_tr_attr
List of string trace attributes to consider in data vector creation
- str_ev_attr
List of string event attributes to consider in data vector creation
- num_tr_attr
List of numeric trace attributes to consider in data vector creation
- num_ev_attr
List of numeric event attributes to consider in data vector creation
- str_evsucc_attr
List of attributes succession of values to consider in data vector creation
- feature_names
(If provided) Feature to use in the representation of the log
Returns#
- data
Data to provide for decision tree learning
- feature_names
Names of the features, in order
- pm4py.algo.transformation.log_to_features.variants.trace_based.apply(log: EventLog, parameters: Dict[str | Parameters, Any] | None = None) Tuple[Any, List[str]] [source]#
Extract the features from an event log (a vector for each trace)
Parameters#
- log
Log
- parameters
Parameters of the algorithm, including: - STR_TRACE_ATTRIBUTES => string trace attributes to consider in the features extraction - STR_EVENT_ATTRIBUTES => string event attributes to consider in the features extraction - NUM_TRACE_ATTRIBUTES => numeric trace attributes to consider in the features extraction - NUM_EVENT_ATTRIBUTES => numeric event attributes to consider in the features extraction - STR_EVSUCC_ATTRIBUTES => succession of event attributes to consider in the features extraction - FEATURE_NAMES => features to consider (in the given order) - ENABLE_ALL_EXTRA_FEATURES => enables all the extra features - ENABLE_CASE_DURATION => enables the case duration as additional feature - ENABLE_TIMES_FROM_FIRST_OCCURRENCE => enables the addition of the times from start of the case, to the end of the case, from the first occurrence of an activity of a case - ADD_CASE_IDENTIFIER_COLUMN => adds the case identifier (string) as column of the feature table (default: False) - ENABLE_TIMES_FROM_LAST_OCCURRENCE => enables the addition of the times from start of the case, to the end of the case, from the last occurrence of an activity of a case - ENABLE_DIRECT_PATHS_TIMES_LAST_OCC => add the duration of the last occurrence of a directed (i, i+1) path in the case as feature - ENABLE_INDIRECT_PATHS_TIMES_LAST_OCC => add the duration of the last occurrence of an indirect (i, j) path in the case as feature - ENABLE_WORK_IN_PROGRESS => enables the work in progress (number of concurrent cases) as a feature - ENABLE_RESOURCE_WORKLOAD => enables the resource workload as a feature - ENABLE_FIRST_LAST_ACTIVITY_INDEX => enables the insertion of the indexes of the activities as features - ENABLE_MAX_CONCURRENT_EVENTS => enables the count of the number of concurrent events inside a case - ENABLE_MAX_CONCURRENT_EVENTS_PER_ACTIVITY => enables the count of the number of concurrent events per activity
Returns#
- data
Data to provide for decision tree learning
- feature_names
Names of the features, in order