Handling Event Data

Importing XES

The IEEE XES standard defines the format for storing event logs. For more information about the format, please visit the IEEE XES website. A simple synthetic event log file (running-example.xes) can be downloaded here. Additionally, several real event logs have been made available over the past few years, which you can find here.

The example code demonstrates how to import an event log stored in the IEEE XES format, given the file path to the log file. It uses the standard importer (iterparse), which is described in more detail later. Note that IEEE XES event logs are imported into a Pandas DataFrame.

          
        

Importing CSV

Apart from the IEEE XES standard, many event logs are also stored in CSV files. In PM4Py, there are generally two ways to handle CSV files:

  • Import the CSV into a Pandas DataFrame:
    Most existing algorithms in PM4Py are designed to be flexible regarding their input. If an event log object is provided in an unsupported format, it is automatically translated into the appropriate form. Therefore, after importing a DataFrame, most algorithms can directly operate on it.
  • Convert the CSV into an event log object:
    This approach involves first importing the CSV file into a Pandas DataFrame (as in the previous method) and then converting it into an event log object, similar to the result of the IEEE XES importer described earlier. In the remainder of this section, we briefly explain how to perform this conversion. Note that most algorithms internally use a similar type of conversion if the input event data is not already in the correct format.
          
      

Note that the example code above may not work directly in many cases. Let us consider a very simple example event log and assume it is stored as a CSV file.

Case IDActivityTimestampClient ID
1register request20200422T04551337
2register request20200422T04571479
1submit payment20200422T05031337
............

In this small example table, we observe four columns: CaseID, Activity, Timestamp, and clientID. When importing the data and converting it into an Event Log object, we aim to group all rows (events) that share the same value in the CaseID column.

Another interesting aspect of the example data is the fourth column, clientID. This column represents a case-level attribute, meaning that the value remains constant throughout the execution of a process instance. PM4Py allows us to specify that a column describes a case-level attribute, under the assumption that the attribute does not change during the process execution.

The example code shows how to convert the previously described CSV data file. After loading the CSV file, we rename the clientID column to case:clientID using a specific operation provided by Pandas.

          
      

Converting Event Data

In this section, we describe how to convert event log objects from one type to another. There are three object types that we can switch between: Event Log, Event Stream, and DataFrame objects. Please refer to the previous code snippet for an example of applying log conversion (as used when importing a CSV file).

Finally, note that most algorithms internally use converters to handle input event data objects of any form. In such cases, default parameters are applied.

To convert from any object to an event log, the following method can be used:

          
      

To convert from any object to an event stream, the following method can be used.

          
      

To convert from any object to a DataFrame, the following method can be used.

          
      

Exporting Event Logs as XES

Exporting an Event Log object to an IEEE XES file is straightforward in PM4Py. In the example, the log object is assumed to be an Event Log object. However, the exporter also accepts Event Stream or DataFrame objects as input.

When a non-Event Log object is provided, the exporter will first convert the input into an Event Log, using standard parameters for the conversion. Therefore, if the user requires more control over the conversion process, it is advisable to explicitly convert the data into an Event Log before exporting.

          
      

Exporting Event Logs as CSV

To export an event log to a CSV file, PM4Py uses Pandas. Therefore, the event log is first converted into a Pandas DataFrame, after which it is written to disk.

          
      

If the provided event log object is not already a DataFrame (i.e., it is an Event Log or Event Stream), the conversion will be applied automatically using the default parameter values, as explained in the Converting Event Data section.

Note that exporting event data to a CSV file does not accept any additional parameters. If more control over the export is needed, it is advisable to first manually convert the event data to a DataFrame before exporting to CSV.