pm4py.filtering.filter_variants_by_coverage_percentage#
- pm4py.filtering.filter_variants_by_coverage_percentage(log: EventLog | DataFrame, min_coverage_percentage: float, activity_key: str = 'concept:name', timestamp_key: str = 'time:timestamp', case_id_key: str = 'case:concept:name') EventLog | DataFrame [source]#
Filters the variants of the log by a coverage percentage (e.g., if min_coverage_percentage=0.4, and we have a log with 1000 cases, of which 500 of the variant 1, 400 of the variant 2, and 100 of the variant 3, the filter keeps only the traces of variant 1 and variant 2).
- Parameters:
log – event log / Pandas dataframe
min_coverage_percentage (
float
) – minimum allowed percentage of coverageactivity_key (
str
) – attribute to be used for the activitytimestamp_key (
str
) – attribute to be used for the timestampcase_id_key (
str
) – attribute to be used as case identifier
- Return type:
Union[EventLog, pd.DataFrame]
import pm4py filtered_dataframe = pm4py.filter_variants_by_coverage_percentage(dataframe, 0.1, activity_key='concept:name', timestamp_key='time:timestamp', case_id_key='case:concept:name')