Profiling overwriting validations and vice versa

Hi

background:
I am running GE in hosted environment using this - DataContext created during the run.
I have two different Jobs - one for Validation (daily) and another for Profiling(weekly/monthly). I am using same S3 bucket for the expectation and validation stores but with a different prefix. I am building data_docs for every job (table) run.

For e.g.,

validation.py - parameters
    stores={
        "expectations_S3_store": {
            "class_name": "ExpectationsStore",
            "store_backend": {
                "class_name": "TupleS3StoreBackend",
                "bucket": "s3_bucket",
                "prefix": "validation_expectation_prefix",
            },
        },
        "validations_S3_store": {
            "class_name": "ValidationsStore",
            "store_backend": {
                "class_name": "TupleS3StoreBackend",
                "bucket": s3_bucket,
                "prefix": "validations",
            },
        },
        "evaluation_parameter_store": {"class_name": "EvaluationParameterStore"},
    },


profiling.py - parameters
    stores={
        "expectations_S3_store": {
            "class_name": "ExpectationsStore",
            "store_backend": {
                "class_name": "TupleS3StoreBackend",
                "bucket": "s3_bucket",
                "prefix": "profiling_expectation_prefix",
            },
        },
        "validations_S3_store": {
            "class_name": "ValidationsStore",
            "store_backend": {
                "class_name": "TupleS3StoreBackend",
                "bucket": "s3_bucket,
                "prefix": "profiling/results",
            },
        },
        "evaluation_parameter_store": {"class_name": "EvaluationParameterStore"},
    },

Issue:
Everything runs as expected when I run Validation or Profiling. But, when I run Profiling after Validation, all the Validation results (html files & index.html) get replaced with Profiling results. Similarly, if I run Validation after Profiling, all Profiling results get replaced with Validation results. This happens just not with the same datasets, but also with completely different dataset for Validation & Profiling.

Please suggest on what I can do to keep both Validation and Profiling results, and be able to combine them into the same index.html. When I run everything from my local machine, it does that but some how from the hosted environment (serverless), one overwrites the other.

Thank you for your help.

1 Like

The default behavior of the Data Docs site builder is to remove HTML files for Expectation Suites and Validation Results that it does not find in the corresponding store. Since you configure separate expectations and validations stores in your two jobs, you are correct - one job running wipes out the HTML files created by the other.

Consider configuring the same stores for both jobs - your Data Docs will look like this:

Hi @eugene.mandel, thank you so much. That’s great, it worked. I was trying to separate out the expectations for Validations and Profiling so did it like that.