pm4py.org.discover_network_analysis#
- pm4py.org.discover_network_analysis(log: DataFrame | EventLog | EventStream, out_column: str, in_column: str, node_column_source: str, node_column_target: str, edge_column: str, edge_reference: str = '_out', performance: bool = False, sorting_column: str = 'time:timestamp', timestamp_column: str = 'time:timestamp') Dict[Tuple[str, str], Dict[str, Any]] [source]#
Performs a network analysis of the log based on the provided parameters.
Classical social network analysis methods are based on the order of events within a case. For example, the Handover of Work metric considers the directly-follows relationships between resources during the execution of a case. An edge is added between two resources if such a relationship occurs.
Real-life scenarios may be more complicated. Firstly, it is difficult to collect events within the same case without encountering convergence/divergence issues (see the first section of the OCEL part). Secondly, the type of relationship may also be important. For example, the relationship between two resources may be more efficient if the activity executed is liked by the resources rather than disliked.
The network analysis introduced here generalizes some existing social network analysis metrics, making them independent of the case notion and allowing the construction of a multigraph instead of a simple graph.
We assume events are linked by signals. An event emits a signal (contained in one attribute of the event) that is assumed to be received by other events (also containing this attribute) that follow the first event in the log. We assume there is an OUT attribute (of the event) that is identical to the IN attribute (of the other events).
When collecting this information, we can build the network analysis graph: - The source node of the relationship is determined by aggregating the node_column_source attribute. - The target node of the relationship is determined by aggregating the node_column_target attribute. - The type of edge is determined by aggregating the edge_column attribute. - The network analysis graph can be annotated with frequency or performance information.
The output is a multigraph. Two events EV1 and EV2 in the log are connected (independently of the case notion) based on having EV1.OUT_COLUMN = EV2.IN_COLUMN. Then, an aggregation is applied on the pair of events (NODE_COLUMN) to obtain the connected nodes. The edges between these nodes are aggregated based on some property of the source event (edge_column).
- Parameters:
log – Event log, Pandas DataFrame, or EventStream.
out_column (
str
) – The source column of the link (default: the case identifier; events of the same case are linked).in_column (
str
) – The target column of the link (default: the case identifier; events of the same case are linked).node_column_source (
str
) – The attribute to be used for defining the source node (default: the resource of the log, “org:resource”).node_column_target (
str
) – The attribute to be used for defining the target node (default: the resource of the log, “org:resource”).edge_column (
str
) – The attribute to be used for defining the edge (default: the activity of the log, “concept:name”).edge_reference (
str
) – Determines if the edge attribute should be picked from the source event. Values: “_out” => the source event; “_in” => the target event.performance (
bool
) – Boolean value that enables performance calculation on the edges of the network analysis.sorting_column (
str
) – The column to be used for sorting the log before performing the network analysis (default: “time:timestamp”).timestamp_column (
str
) – The column to be used as timestamp for performance-related analysis (default: “time:timestamp”).
- Return type:
Dict[Tuple[str, str], Dict[str, Any]]
import pm4py net_ana = pm4py.discover_network_analysis( dataframe, out_column='case:concept:name', in_column='case:concept:name', node_column_source='org:resource', node_column_target='org:resource', edge_column='concept:name', edge_reference='_out', performance=False, sorting_column='time:timestamp', timestamp_column='time:timestamp' )