If the skip_and_clean_missing
flag in DefaultSiteIndexBuilder.build is set to True (this is the default), then, when an index page is being built and an existing HTML page does not have corresponding source data (i.e. an expectation suite or validation result was removed from source store), the HTML page is automatically deleted and will not appear in the index. This ensures that the expectations store and validations store are the source of truth for Data Docs.
Option 1: All the Airflow DAGs share the same GE Data Context (the config file).
Whether the Data Context’s expectations store uses the filesystem or S3, each DAG will have access to Expectations Suites for all the DAGs. This means that the flag being True does not delete any Expectation Suite HTML files.
Option 1.a.
If the Data Context’s validations_store is a shared one (e.g., S3), there is no issue with deleting validation results’ HTML files.
Option 1.b.
However, if it is configured to use the filesystem, each Operator will delete the HTML files of others’ validation results. To avoid this, you would have to extend DefaultSiteIndexBuilder and set the flag to false in your child class. Then you can set the site_index_builder config property to your class name.
Option 2: Each Airflow DAG has its own GE Data Context (the config file).
In this case, each Data Context can configure its Data Docs site to use the same S3 bucket, but different prefix. This way there will be a site per DAG, which might be pretty convenient. You may choose to add another HTML file manually that would link to all these sites.