Zeverin Isert

BEng (ӣ��Ӱ��, 2018)

Notice of the Final Oral Examination for the Degree of Master of Applied Science

Topic

Proactive monitoring for data drift in classification models

Department of Electrical and Computer Engineering

Date & location

Thursday, November 13, 2025
1:00 P.M.
Engineering Office Wing
Room 230

Reviewers

Supervisory Committee

Dr. Thomas Darcie, Department of Electrical and Computer Engineering, UVic (Co-Supervisor)
Dr. Stephen Neville, Department of Electrical and Computer Engineering, UVic (Co-Supervisor)

External Examiner

Dr. Ulrike Stege, Department of Computer Science, ӣ��Ӱ��

Chair of Oral Examination

Dr. Darlene Clover, Department of Educational Psychology and Leadership Studies, UVic

Abstract

This thesis introduces methods for proactively monitoring data drift in classification models, aiming to provide early warnings before significant changes occur in the prediction distribution. These “proactive alerts” allow model operators to initiate retraining processes before models enter persistent failure modes, contrasting with traditional “reactive” alerts that only trigger after observing sufficient evidence of data misclassification. The research was conducted in collaboration with Revela Systems, addressing a common industry need for robust machine learning model monitoring.

The motivation for proactive monitoring stems from significant costs and time savings associated with preventing model failures. Existing monitoring approaches often fall short, either due to test data not being representative of live data or the inherent unpredictability of live data with unknown drift. The proposed approach aims to be lightweight, scalable, and industry-deployable by making pragmatic trade-offs against academic alternatives.

The core contribution is a novel algorithm for detecting data drift in black-box classification models, focusing on understanding and tracking the model’s decision boundary. This algorithm is designed for production environments, capable of making predictions on both live and synthetic data. An incremental version of the algorithm, which does not require access to training data, can detect concept drift as it happens by querying the model. Furthermore, the thesis presents a method for synthesizing examples in high-dimensional feature spaces, crucial for mapping these decision boundaries.

The methodology centers on extracting meaning from a classifier’s decision boundary by learning how its output changes through data synthesis and live data monitoring. A classifier divides its input space into regions, with decision boundaries existing where the corresponding discriminant functions are equal. Two main approaches are explored: an eager mapping of the decision boundary in a reduced dimension and a lazy probing technique in the original feature space. Both techniques utilize a form of raycasting to determine if input data trends toward these boundaries, which implements the proactive component of the approach.

Experimental results were obtained using a custom “incident generator” tool that synthesizes infinite data streams based on the statistics of fixed-size datasets. The tool allows for real-time mutation and injection of various data drift types, including incremental, sudden, and gradual drifts. The metric for evaluating the system’s performance was the percentage of probing rays that collided with a decision boundary. Experiments showed the monitoring system’s ability to detect incremental drifts, providing early indications as data trends toward decision boundaries. While the system could react to sudden drifts, forecasting them proved challenging due to the abrupt nature of the change. Gradual drifts offered an opportunity for forecasting, though as changes are non positional, reaction could be delayed. The experiments also highlighted frailty in ray collision percentages as a measurement and a remaining significant processing time for certain high dimensional scenarios.

Back to oral exams

ӣ��Ӱ��