The IEEE XES standard defines the format for storing event logs. For more information about the format, please visit the IEEE XES website. A simple synthetic event log file (running-example.xes
) can be downloaded here. Additionally, several real event logs have been made available over the past few years, which you can find here.
The example code demonstrates how to import an event log stored in the IEEE XES format, given the file path to the log file. It uses the standard importer (iterparse
), which is described in more detail later. Note that IEEE XES event logs are imported into a Pandas DataFrame.
Apart from the IEEE XES standard, many event logs are also stored in CSV files. In PM4Py, there are generally two ways to handle CSV files:
Note that the example code above may not work directly in many cases. Let us consider a very simple example event log and assume it is stored as a CSV file.
Case ID | Activity | Timestamp | Client ID |
---|---|---|---|
1 | register request | 20200422T0455 | 1337 |
2 | register request | 20200422T0457 | 1479 |
1 | submit payment | 20200422T0503 | 1337 |
... | ... | ... | ... |
In this small example table, we observe four columns: CaseID, Activity, Timestamp, and clientID. When importing the data and converting it into an Event Log object, we aim to group all rows (events) that share the same value in the CaseID column.
Another interesting aspect of the example data is the fourth column, clientID. This column represents a case-level attribute, meaning that the value remains constant throughout the execution of a process instance. PM4Py allows us to specify that a column describes a case-level attribute, under the assumption that the attribute does not change during the process execution.
The example code shows how to convert the previously described CSV data file. After loading the CSV file, we rename the clientID column to case:clientID
using a specific operation provided by Pandas.
In this section, we describe how to convert event log objects from one type to another. There are three object types that we can switch between: Event Log, Event Stream, and DataFrame objects. Please refer to the previous code snippet for an example of applying log conversion (as used when importing a CSV file).
Finally, note that most algorithms internally use converters to handle input event data objects of any form. In such cases, default parameters are applied.
To convert from any object to an event log, the following method can be used:
To convert from any object to an event stream, the following method can be used.
To convert from any object to a DataFrame, the following method can be used.
Exporting an Event Log object to an IEEE XES file is straightforward in PM4Py. In the example, the log object is assumed to be an Event Log object. However, the exporter also accepts Event Stream or DataFrame objects as input.
When a non-Event Log object is provided, the exporter will first convert the input into an Event Log, using standard parameters for the conversion. Therefore, if the user requires more control over the conversion process, it is advisable to explicitly convert the data into an Event Log before exporting.
To export an event log to a CSV file, PM4Py uses Pandas. Therefore, the event log is first converted into a Pandas DataFrame, after which it is written to disk.
If the provided event log object is not already a DataFrame (i.e., it is an Event Log or Event Stream), the conversion will be applied automatically using the default parameter values, as explained in the Converting Event Data section.
Note that exporting event data to a CSV file does not accept any additional parameters. If more control over the export is needed, it is advisable to first manually convert the event data to a DataFrame before exporting to CSV.