pm4py.algo.discovery.dfg.adapters.pandas package#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

Submodules#

pm4py.algo.discovery.dfg.adapters.pandas.df_statistics module#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

pm4py.algo.discovery.dfg.adapters.pandas.df_statistics.get_dfg_graph(df, measure='frequency', activity_key='concept:name', case_id_glue='case:concept:name', start_timestamp_key=None, timestamp_key='time:timestamp', perf_aggregation_key='mean', sort_caseid_required=True, sort_timestamp_along_case_id=True, keep_once_per_case=False, window=1, business_hours=False, business_hours_slot=None, workcalendar=None, target_activity_key=None, reduce_columns=True, cost_attribute=None)[source]#

Get DFG graph from Pandas dataframe - optimized version

Parameters#

df

Dataframe

measure

Measure to use (frequency/performance/both)

activity_key

Activity key to use in the grouping

case_id_glue

Case ID identifier

start_timestamp_key

Start timestamp key

timestamp_key

Timestamp key

perf_aggregation_key

Performance aggregation key (mean, median, min, max)

sort_caseid_required

Specify if a sort on the Case ID is required

sort_timestamp_along_case_id

Specifying if sorting by timestamp along the CaseID is required

keep_once_per_case

In the counts, keep only one occurrence of the path per case (the first)

window

Window of the DFG (default 1)

Returns#

dfg

DFG in the chosen measure (may be only the frequency, only the performance, or both)

pm4py.algo.discovery.dfg.adapters.pandas.df_statistics.get_partial_order_dataframe(df, start_timestamp_key=None, timestamp_key='time:timestamp', case_id_glue='case:concept:name', activity_key='concept:name', sort_caseid_required=True, sort_timestamp_along_case_id=True, reduce_dataframe=True, keep_first_following=True, business_hours=False, business_hours_slot=None, workcalendar=None, event_index='@@index')[source]#

Gets the partial order between events (of the same case) in a Pandas dataframe

Parameters#

df

Dataframe

start_timestamp_key

Start timestamp key (if not provided, defaulted to the timestamp_key)

timestamp_key

Complete timestamp

case_id_glue

Column of the dataframe to use as case ID

activity_key

Activity key

sort_caseid_required

Tells if a sort by case ID is required (default: True)

sort_timestamp_along_case_id

Tells if a sort by timestamp is required along the case ID (default: True)

reduce_dataframe

To fasten operation, keep only essential columns in the dataframe

keep_first_following

Keep only the first event following the given event

Returns#

part_ord_dataframe

Partial order dataframe (with @@flow_time between events)

pm4py.algo.discovery.dfg.adapters.pandas.df_statistics.get_concurrent_events_dataframe(df, start_timestamp_key=None, timestamp_key='time:timestamp', case_id_glue='case:concept:name', activity_key='concept:name', sort_caseid_required=True, sort_timestamp_along_case_id=True, reduce_dataframe=True, max_start_column='@@max_start_column', min_complete_column='@@min_complete_column', diff_maxs_minc='@@diff_maxs_minc', strict=False)[source]#

Gets the concurrent events (of the same case) in a Pandas dataframe

Parameters#

df

Dataframe

start_timestamp_key

Start timestamp key (if not provided, defaulted to the timestamp_key)

timestamp_key

Complete timestamp

case_id_glue

Column of the dataframe to use as case ID

activity_key

Activity key

sort_caseid_required

Tells if a sort by case ID is required (default: True)

sort_timestamp_along_case_id

Tells if a sort by timestamp is required along the case ID (default: True)

reduce_dataframe

To fasten operation, keep only essential columns in the dataframe

strict

Gets only entries that are strictly concurrent (i.e. the length of the intersection as real interval is > 0)

Returns#

conc_ev_dataframe

Concurrent events dataframe (with @@diff_maxs_minc as the size of the intersection of the intervals)

pm4py.algo.discovery.dfg.adapters.pandas.freq_triples module#

PM4Py – A Process Mining Library for Python

Copyright (C) 2024 Process Intelligence Solutions UG (haftungsbeschränkt)

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see this software project’s root or visit <https://www.gnu.org/licenses/>.

Website: https://processintelligence.solutions Contact: info@processintelligence.solutions

pm4py.algo.discovery.dfg.adapters.pandas.freq_triples.get_freq_triples(df, activity_key='concept:name', case_id_glue='case:concept:name', timestamp_key='time:timestamp', sort_caseid_required=True, sort_timestamp_along_case_id=True)[source]#

Gets the frequency triples out of a dataframe

Parameters#

df

Dataframe

activity_key

Activity key

case_id_glue

Case ID glue

timestamp_key

Timestamp key

sort_caseid_required

Determine if sort by case ID is required (default: True)

sort_timestamp_along_case_id

Determine if sort by timestamp is required (default: True)

Returns#

freq_triples

Frequency triples from the dataframe