Directly-follows graphs (DFGs) are among the simplest types of process models. In these graphs, the nodes represent the activities, and the edges indicate how frequently one activity is followed by another. In PM4Py, we provide advanced operations on top of DFGs, including the discovery of the DFG along with the start and end activities of the log. This can be achieved using the following command:
Alternatively, to discover the activities in the log along with their occurrence frequencies (assuming that `concept:name` is the attribute reporting the activity), use the following command:
Directly-follows graphs can contain a large number of activities and paths, some of which may be outliers. In this section, we demonstrate how to filter the activities and paths of the graph, retaining only a subset of the behavior. First, we load an example log and calculate the DFG.
The following snippet applies filtering based on the percentage of activities. The most frequent activities, as defined by the percentage, are retained along with all activities that are necessary to maintain graph connectivity. If a percentage of 0% is specified, only the most frequent activity (and those that ensure connectivity) is kept. For example, setting the percentage to 0.2 keeps 20% of the activities. The filter is applied simultaneously to the DFG, start activities, end activities, and the dictionary of activity occurrences to ensure consistency.
The following snippet demonstrates how to filter paths based on their percentage. The most frequent paths, defined by the percentage, are retained along with any paths necessary to maintain connectivity. If 0% is specified, only the most frequent path (and those ensuring connectivity) is kept. For example, setting the percentage to 0.2 keeps 20% of the paths. Similar to activity filtering, this filter is applied concurrently to the DFG, start activities, end activities, and the activity occurrences dictionary.
A playout operation on a DFG is useful for retrieving the traces allowed by the graph. A trace represents a sequence of activities from the start node to the end node of the DFG. We can assign a probability to each trace, assuming the DFG represents a Markov chain. This section shows how to perform the playout of a DFG to retrieve the most likely traces. First, we load an example log and calculate the DFG.
Once the DFG is computed, we can perform the playout operation as follows:
Alignments are a popular conformance checking technique, typically applied to Petri nets. However, performing alignments on a DFG can be more efficient because the state space of a DFG is much smaller. This allows for quick diagnostics of activities and paths that are executed incorrectly. In this section, we demonstrate how to perform alignments between process executions and a DFG. First, we load an example log and calculate the DFG.
Once the DFG is computed, we can perform alignments between the process executions of the log and the DFG:
The output of the alignment process is similar to the one obtained for Petri nets. It consists of a list for each trace showing the result of the alignment, including sync moves, moves on the log (where a move in the process execution is not reflected in the DFG), and moves on the model (where a move is needed in the model but not supported by the process execution).
The Directly-Follows Graph (DFG) is a common representation of a process used by many commercial tools. Sander Leemans proposed the idea of converting the DFG into a workflow net that perfectly mimics the DFG, a process known as DFG mining. The following steps describe how to load a log, calculate the DFG, convert it to a workflow net, and perform alignments.