Great question! Firstly, absolutely Great Expectations is a great fit for this use case. It’s actually one of the original cases that inspired my work on it.
There’s a lot in your question that I think is worth a deep discussion about the practice of monitoring and measuring reliability of a ML system, but I think a concrete way to think about the action here is to view the ML model itself as a node in a DAG that processes input data (the features to be used for modeling) and produces output data.
In that simple model, Great Expectations is useful for both the input and the output. On the input side, you’re likely to be able to have expectations about:
- structure of data, to ensure the preprocessing is working as intended, and
- distribution of data, to ensure you’re seeing data that is similar to what you trained on
On the output side, you essentially can think of your expectations as also falling into those two categories, though of course the expectations themselves are likely to be different.
There are alternatives to GE, but I think of them as falling into two extremes, with GE being more in the middle of these:
- process-oriented quality checks, such as sampling and routing to human reviewers. I believe, by the way, that such an approach is absolutely essential, but it’s best done in a way where the level of effort is informed by GE
- Anomaly detection checks, where you’re asking a system to see if you observe a change in distributions say, but without being able to benefit from your designer’s knowledge about what the system should see.
Happy to discuss this more!