pm4py.utils.format_dataframe#

pm4py.utils.format_dataframe(df: DataFrame, case_id: str = 'case:concept:name', activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', start_timestamp_key: str = 'start_timestamp', timest_format: str | None = None) DataFrame[source]#

Formats the dataframe appropriately for process mining purposes.

Parameters:
  • df (DataFrame) – Dataframe.

  • case_id (str) – Case identifier column.

  • activity_key (str) – Activity column.

  • timestamp_key (str) – Timestamp column.

  • start_timestamp_key (str) – Start timestamp column.

  • timest_format – Timestamp format provided to Pandas.

Returns:

A formatted pandas DataFrame.

Return type:

pd.DataFrame

import pandas as pd
import pm4py

dataframe = pd.read_csv('event_log.csv')
dataframe = pm4py.format_dataframe(
    dataframe,
    case_id='case:concept:name',
    activity_key='concept:name',
    timestamp_key='time:timestamp',
    start_timestamp_key='start_timestamp',
    timest_format='%Y-%m-%d %H:%M:%S'
)