I want to understand how GE integrates with data bricks delta tables? If you have any sample notebook it will be helpful to understand.
Rich Louden wrote up a very detailed blog post on using GE with Databricks: https://www.unsupervised-learnings.co.uk/post/setting-your-data-expectations-data-profiling-and-testing-with-the-great-expectations-library/
The reference for this class has a sample yml config that you can copy and modify.
reader_method: parquet in the generator’s config with
Some API’s have changed since that blog post was published. If you are using GE 0.9.0 (or higher),
please replace the build_expectations method from the post with this:
def build_expectations(database, asset_name, context): exp_suite_name = database + "." + asset_name + "." + str(datetime.today().strftime("%d-%m-%Y")) + "_expectations" data = spark.table(database + "." + asset_name) spark_data = SparkDFDataset(data) profile = spark_data.profile(BasicDatasetProfiler) context.save_expectation_suite(profile, exp_suite_name) sqlContext.uncacheTable(database + "." + asset_name)
The profile method returns a tuple (expectation suite, validation results), so you need to pass the first member of that tuple to save_expectation_suite.
Also, you don’t have to create an empty expectation suite before profiling.