pm4py.algo.filtering.dfg package#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
Submodules#
pm4py.algo.filtering.dfg.dfg_filtering module#
PM4Py – A Process Mining Library for Python
Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.
Website: https://processintelligence.solutions Contact: info@processintelligence.solutions
- pm4py.algo.filtering.dfg.dfg_filtering.generate_nx_graph_from_dfg(dfg, start_activities, end_activities, activities_count)[source]#
Generate a NetworkX graph for reachability-checking purposes out of the DFG
Parameters#
- dfg
DFG
- start_activities
Start activities
- end_activities
End activities
- activities_count
Activities of the DFG along with their count
Returns#
- G
NetworkX digraph
- start_node
Identifier of the start node (connected to all the start activities)
- end_node
Identifier of the end node (connected to all the end activities)
- pm4py.algo.filtering.dfg.dfg_filtering.build_adjacency_structures(dfg, start_activities, end_activities)[source]#
Build forward (adj) and reverse (rev_adj) adjacency lists for the DFG, plus two synthetic nodes for the “start” and “end”. - start_node points to each node in start_activities. - each node in end_activities points to end_node.
- Returns:
adj, rev_adj, start_node, end_node
- pm4py.algo.filtering.dfg.dfg_filtering.bfs_reachable(start, adj)[source]#
Returns the set of nodes reachable from ‘start’ in the directed graph defined by adjacency list ‘adj’.
- pm4py.algo.filtering.dfg.dfg_filtering.remove_unreachable_nodes(dfg, start_activities, end_activities, activities_count, adj, rev_adj, start_node, end_node)[source]#
Removes from the DFG (and related dictionaries) any activity/node that is not reachable from start_node or cannot reach end_node, based on the current adjacency structure ‘adj’ and ‘rev_adj’.
- pm4py.algo.filtering.dfg.dfg_filtering.filter_dfg_on_activities_percentage(dfg0, start_activities0, end_activities0, activities_count0, percentage)[source]#
Filters a DFG (complete, and so connected) on the specified percentage of activities (but ensuring that every node is still reachable from the start and can reach the end).
Parameters#
- dfg0
(Complete, and so connected) DFG
- start_activities0
Start activities
- end_activities0
End activities
- activities_count0
Activities of the DFG along with their count
- percentage
Percentage of activities
Returns#
- dfg
(Filtered) DFG
- start_activities
(Filtered) start activities
- end_activities
(Filtered) end activities
- activities_count
(Filtered) activities of the DFG along with their count
- pm4py.algo.filtering.dfg.dfg_filtering.filter_dfg_on_paths_percentage(dfg0, start_activities0, end_activities0, activities_count0, percentage, keep_all_activities=False)[source]#
Filters a DFG (complete, and so connected) on the specified percentage of paths (but ensuring that every node is still reachable from the start and can reach the end).
Parameters#
- dfg0
(Complete, and so connected) DFG
- start_activities0
Start activities
- end_activities0
End activities
- activities_count0
Activities of the DFG along with their count
- percentage
Percentage of paths
- keep_all_activities
If True, keep all activities (only remove edges) and preserve connectivity; otherwise, only guarantee that the activities in the high-percentage edges remain connected.
Returns#
- dfg
(Filtered) DFG
- start_activities
(Filtered) start activities
- end_activities
(Filtered) end activities
- activities_count
(Filtered) activities of the DFG along with their count
- pm4py.algo.filtering.dfg.dfg_filtering.filter_dfg_keep_connected(dfg0, start_activities0, end_activities0, activities_count0, threshold, keep_all_activities=False)[source]#
Filters a DFG (complete, and so connected) on the specified dependency threshold (similar to Heuristics Miner dependency), but ensuring every node is still reachable from the start and can reach the end.
Parameters#
- dfg0
(Complete, and so connected) DFG
- start_activities0
Start activities
- end_activities0
End activities
- activities_count0
Activities of the DFG along with their count
- threshold
Dependency threshold as in the Heuristics Miner
- keep_all_activities
If True, keep all activities (only remove edges that fall below threshold); otherwise, remove activities not connected by high-dependency edges.
Returns#
- dfg
(Filtered) DFG
- start_activities
(Filtered) start activities
- end_activities
(Filtered) end activities
- activities_count
(Filtered) activities of the DFG along with their count
- pm4py.algo.filtering.dfg.dfg_filtering.filter_dfg_to_activity(dfg0, start_activities0, end_activities0, activities_count0, target_activity, parameters=None)[source]#
Filters the DFG, making “target_activity” the only possible end activity of the graph
Parameters#
- dfg0
Directly-follows graph
- start_activities0
Start activities
- end_activities0
End activities
- activities_count0
Activities count
- target_activity
Target activity (only possible end activity after the filtering)
- parameters
Parameters
Returns#
- dfg
Filtered DFG
- start_activities
Filtered start activities
- end_activities
Filtered end activities
- activities_count
Filtered activities count
- pm4py.algo.filtering.dfg.dfg_filtering.filter_dfg_from_activity(dfg0, start_activities0, end_activities0, activities_count0, source_activity, parameters=None)[source]#
Filters the DFG, making “source_activity” the only possible source activity of the graph
Parameters#
- dfg0
Directly-follows graph
- start_activities0
Start activities
- end_activities0
End activities
- activities_count0
Activities count
- source_activity
Source activity (only possible start activity after the filtering)
- parameters
Parameters
Returns#
- dfg
Filtered DFG
- start_activities
Filtered start activities
- end_activities
Filtered end activities
- activities_count
Filtered activities count
- pm4py.algo.filtering.dfg.dfg_filtering.filter_dfg_contain_activity(dfg0, start_activities0, end_activities0, activities_count0, activity, parameters=None)[source]#
Filters the DFG keeping only nodes that can reach / are reachable from activity
Parameters#
- dfg0
Directly-follows graph
- start_activities0
Start activities
- end_activities0
End activities
- activities_count0
Activities count
- activity
Activity that should be reachable / should reach all the nodes of the filtered graph
- parameters
Parameters
Returns#
- dfg
Filtered DFG
- start_activities
Filtered start activities
- end_activities
Filtered end activities
- activities_count
Filtered activities count
- pm4py.algo.filtering.dfg.dfg_filtering.clean_dfg_based_on_noise_thresh(dfg, activities, noise_threshold, parameters=None)[source]#
Clean Directly-Follows graph based on noise threshold
Parameters#
- dfg
Directly-Follows graph
- activities
Activities in the DFG graph
- noise_threshold
Noise threshold
Returns#
- newDfg
Cleaned dfg based on noise threshold