Zeverin Isert
-
BEng (樱花影视, 2018)
Topic
Proactive monitoring for data drift in classification models
Department of Electrical and Computer Engineering
Date & location
-
Thursday, November 13, 2025
-
1:00 P.M.
-
Engineering Office Wing
-
Room 230
Reviewers
Supervisory Committee
-
Dr. Thomas Darcie, Department of Electrical and Computer Engineering, UVic (Co-Supervisor)
-
Dr. Stephen Neville, Department of Electrical and Computer Engineering, UVic (Co-Supervisor)
External Examiner
-
Dr. Ulrike Stege, Department of Computer Science, 樱花影视
Chair of Oral Examination
-
Dr. Darlene Clover, Department of Educational Psychology and Leadership Studies, UVic
Abstract
This thesis introduces methods for proactively monitoring data drift in classification models, aiming to provide early warnings before significant changes occur in the prediction distribution. These “proactive alerts” allow model operators to initiate retraining processes before models enter persistent failure modes, contrasting with traditional “reactive” alerts that only trigger after observing sufficient evidence of data misclassification. The research was conducted in collaboration with Revela Systems, addressing a common industry need for robust machine learning model monitoring.
The motivation for proactive monitoring stems from significant costs and time savings associated with preventing model failures. Existing monitoring approaches often fall short, either due to test data not being representative of live data or the inherent unpredictability of live data with unknown drift. The proposed approach aims to be lightweight, scalable, and industry-deployable by making pragmatic trade-offs against academic alternatives.
The core contribution is a novel algorithm for detecting data drift in black-box classification models, focusing on understanding and tracking the model’s decision boundary. This algorithm is designed for production environments, capable of making predictions on both live and synthetic data. An incremental version of the algorithm, which does not require access to training data, can detect concept drift as it happens by querying the model. Furthermore, the thesis presents a method for synthesizing examples in high-dimensional feature spaces, crucial for mapping these decision boundaries.
The methodology centers on extracting meaning from a classifier’s decision boundary by learning how its output changes through data synthesis and live data monitoring. A classifier divides its input space into regions, with decision boundaries existing where the corresponding discriminant functions are equal. Two main approaches are explored: an eager mapping of the decision boundary in a reduced dimension and a lazy probing technique in the original feature space. Both techniques utilize a form of raycasting to determine if input data trends toward these boundaries, which implements the proactive component of the approach.
Experimental results were obtained using a custom “incident generator” tool that synthesizes infinite data streams based on the statistics of fixed-size datasets. The tool allows for real-time mutation and injection of various data drift types, including incremental, sudden, and gradual drifts. The metric for evaluating the system’s performance was the percentage of probing rays that collided with a decision boundary. Experiments showed the monitoring system’s ability to detect incremental drifts, providing early indications as data trends toward decision boundaries. While the system could react to sudden drifts, forecasting them proved challenging due to the abrupt nature of the change. Gradual drifts offered an opportunity for forecasting, though as changes are non positional, reaction could be delayed. The experiments also highlighted frailty in ray collision percentages as a measurement and a remaining significant processing time for certain high dimensional scenarios.