pm4py.algo.filtering.dfg package#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

Submodules#

pm4py.algo.filtering.dfg.dfg_filtering module#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

pm4py.algo.filtering.dfg.dfg_filtering.generate_nx_graph_from_dfg(dfg, start_activities, end_activities, activities_count)[source]#

Generate a NetworkX graph for reachability-checking purposes out of the DFG

Parameters#

dfg

DFG

start_activities

Start activities

end_activities

End activities

activities_count

Activities of the DFG along with their count

Returns#

G

NetworkX digraph

start_node

Identifier of the start node (connected to all the start activities)

end_node

Identifier of the end node (connected to all the end activities)

pm4py.algo.filtering.dfg.dfg_filtering.build_adjacency_structures(dfg, start_activities, end_activities)[source]#

Build forward (adj) and reverse (rev_adj) adjacency lists for the DFG, plus two synthetic nodes for the “start” and “end”. - start_node points to each node in start_activities. - each node in end_activities points to end_node.

Returns:

adj, rev_adj, start_node, end_node

pm4py.algo.filtering.dfg.dfg_filtering.bfs_reachable(start, adj)[source]#

Returns the set of nodes reachable from ‘start’ in the directed graph defined by adjacency list ‘adj’.

pm4py.algo.filtering.dfg.dfg_filtering.remove_unreachable_nodes(dfg, start_activities, end_activities, activities_count, adj, rev_adj, start_node, end_node)[source]#

Removes from the DFG (and related dictionaries) any activity/node that is not reachable from start_node or cannot reach end_node, based on the current adjacency structure ‘adj’ and ‘rev_adj’.

pm4py.algo.filtering.dfg.dfg_filtering.filter_dfg_on_activities_percentage(dfg0, start_activities0, end_activities0, activities_count0, percentage)[source]#

Filters a DFG (complete, and so connected) on the specified percentage of activities (but ensuring that every node is still reachable from the start and can reach the end).

Parameters#

dfg0

(Complete, and so connected) DFG

start_activities0

Start activities

end_activities0

End activities

activities_count0

Activities of the DFG along with their count

percentage

Percentage of activities

Returns#

dfg

(Filtered) DFG

start_activities

(Filtered) start activities

end_activities

(Filtered) end activities

activities_count

(Filtered) activities of the DFG along with their count

pm4py.algo.filtering.dfg.dfg_filtering.filter_dfg_on_paths_percentage(dfg0, start_activities0, end_activities0, activities_count0, percentage, keep_all_activities=False)[source]#

Filters a DFG (complete, and so connected) on the specified percentage of paths (but ensuring that every node is still reachable from the start and can reach the end).

Parameters#

dfg0

(Complete, and so connected) DFG

start_activities0

Start activities

end_activities0

End activities

activities_count0

Activities of the DFG along with their count

percentage

Percentage of paths

keep_all_activities

If True, keep all activities (only remove edges) and preserve connectivity; otherwise, only guarantee that the activities in the high-percentage edges remain connected.

Returns#

dfg

(Filtered) DFG

start_activities

(Filtered) start activities

end_activities

(Filtered) end activities

activities_count

(Filtered) activities of the DFG along with their count

pm4py.algo.filtering.dfg.dfg_filtering.filter_dfg_keep_connected(dfg0, start_activities0, end_activities0, activities_count0, threshold, keep_all_activities=False)[source]#

Filters a DFG (complete, and so connected) on the specified dependency threshold (similar to Heuristics Miner dependency), but ensuring every node is still reachable from the start and can reach the end.

Parameters#

dfg0

(Complete, and so connected) DFG

start_activities0

Start activities

end_activities0

End activities

activities_count0

Activities of the DFG along with their count

threshold

Dependency threshold as in the Heuristics Miner

keep_all_activities

If True, keep all activities (only remove edges that fall below threshold); otherwise, remove activities not connected by high-dependency edges.

Returns#

dfg

(Filtered) DFG

start_activities

(Filtered) start activities

end_activities

(Filtered) end activities

activities_count

(Filtered) activities of the DFG along with their count

pm4py.algo.filtering.dfg.dfg_filtering.filter_dfg_to_activity(dfg0, start_activities0, end_activities0, activities_count0, target_activity, parameters=None)[source]#

Filters the DFG, making “target_activity” the only possible end activity of the graph

Parameters#

dfg0

Directly-follows graph

start_activities0

Start activities

end_activities0

End activities

activities_count0

Activities count

target_activity

Target activity (only possible end activity after the filtering)

parameters

Parameters

Returns#

dfg

Filtered DFG

start_activities

Filtered start activities

end_activities

Filtered end activities

activities_count

Filtered activities count

pm4py.algo.filtering.dfg.dfg_filtering.filter_dfg_from_activity(dfg0, start_activities0, end_activities0, activities_count0, source_activity, parameters=None)[source]#

Filters the DFG, making “source_activity” the only possible source activity of the graph

Parameters#

dfg0

Directly-follows graph

start_activities0

Start activities

end_activities0

End activities

activities_count0

Activities count

source_activity

Source activity (only possible start activity after the filtering)

parameters

Parameters

Returns#

dfg

Filtered DFG

start_activities

Filtered start activities

end_activities

Filtered end activities

activities_count

Filtered activities count

pm4py.algo.filtering.dfg.dfg_filtering.filter_dfg_contain_activity(dfg0, start_activities0, end_activities0, activities_count0, activity, parameters=None)[source]#

Filters the DFG keeping only nodes that can reach / are reachable from activity

Parameters#

dfg0

Directly-follows graph

start_activities0

Start activities

end_activities0

End activities

activities_count0

Activities count

activity

Activity that should be reachable / should reach all the nodes of the filtered graph

parameters

Parameters

Returns#

dfg

Filtered DFG

start_activities

Filtered start activities

end_activities

Filtered end activities

activities_count

Filtered activities count

pm4py.algo.filtering.dfg.dfg_filtering.clean_dfg_based_on_noise_thresh(dfg, activities, noise_threshold, parameters=None)[source]#

Clean Directly-Follows graph based on noise threshold

Parameters#

dfg

Directly-Follows graph

activities

Activities in the DFG graph

noise_threshold

Noise threshold

Returns#

newDfg

Cleaned dfg based on noise threshold