Streaming data support in Great Expectations

Does great expectations support streaming data? If not is it something thats planned for later or something that can be integrated in a fairly simple way


Great Expectations validates batches of data. Streaming can be supported through “minibatching”. Teams that work with streaming data usually define expectations for sliding windows of their data (e.g. “last 5 minutes”, “last hour”, “last 24 hours”) and validate these batches.

dugi_sharma007, can you unpack your tool stack and the use cases you want to support?

@eugene.mandel Understood, I will check and experiment with the sample use cases we have around stream data. @abegong We essentially have data flowing into kafka topics on top of which we would want to apply data quality testing like validity/accuracy/completeness etc.

1 Like

Got it. We don’t (yet) have a fully supported integration with kafka.

I know that some teams have developed kafka integrations, but no one has yet PR’d one back into the open source project.

Also, we’re planning to explore this in Q4. Any chance your team would be interested in collaborating on this?

Yeah we are currently reviewing how the features currently matches up to our other needs. Once we are sure, we would definitely want to collaborate