Caveat: this path has not been tested by anyone at Superconductive, but we’ve been able to help GE users complete these steps to work with GE on Azure.
- Project creation. We recommend that you either:
great_expectations initlocally and configure a SparkDFDatasource that reads from a local directory, or
DataContext.create()directly from a notebook.
Option (a) makes it easy to rapidly tweak your configuration and experiment with GE; starting locally is often a useful path. However, it will require copying your configuration to Azure Blob Storage directly.
Option (b) allows you to directly use Azure Blob Storage.
Ensure that you have configured DBFS to work with Azure Blob Storage. The Databricks docs are here: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage
Load your DataContext from Azure Blob Storage:
context = DataContext("/your/dbfs/project/path")
At that point, you will be able to use the standard Great Expectations flows, including using a notebook that you have created locally if you would like or directly using your data from the notebook.