Current version of Great Expectation framework documentation (0.9.4) does not contain any samples of how to configure a PySpark datasource in order to access the AWS S3 files. It would be really helpful if there is any example of it’s configuration.
You’re right! In fact, it’s really similar to the example for pandas, since spark’s reader methods also know how to process s3 paths:
datasources: nyc_taxi: class_name: SparkDFDatasource generators: s3: class_name: S3GlobReaderBatchKwargsGenerator bucket: nyc-tlc delimiter: '/' reader_options: sep: ',' engine: python assets: taxi-green: prefix: trip data/ regex_filter: 'trip data/green.*\.csv' taxi-fhv: prefix: trip data/ regex_filter: 'trip data/fhv.*\.csv' data_asset_type: class_name: SparkDFDataset