I’m trying to setup GCS-hosting for datadocs. I know there’s a tutorial for AWS. I saw GCS support mentioned a few times in the docs, but couldn’t find a tutorial. Could anyone point me towards a good place to start?
A few things:
You’ll need credentials configured correctly
You’ll need to configure a data docs site as follows in your great_expectations.yml:
You’ll then probably run into this bug that I’m fixing now: https://github.com/great-expectations/great_expectations/issues/1393
Then you may notice that some of the links between data docs pages don’t work. I will file this and begin work on these bugs as well.
Once these bugs are worked out I plan on making a “how to” guide in our official docs.
How did you configure credentials?
I presume via
gcloud auth on the command line. I’m not sure how permissions are typically saved in GE, but GCS commonly relies on service accounts that you can authenticate as JSON key files. Service accounts in GCS are specially made accounts designated for programmatic usage of a specific task.
So, the nice thing about GE in this case is that it doesn’t even know about your credentials - it uses the
google-cloud library service accounts. You might need to create a service account, download the key, and set the environment variable like this:
Side note. The first bug is fixed and merged, and a colleague has fixes for the other bugs I’m hoping to ship tomorrow!
Alright I have some good news! In the upcoming 0.10.9 release which is shipping this morning GCS data docs is verified working with a small caveat!
The caveat is that if you have a
prefix configured your site will not have the correct urls so until this bug is fixed you will need to operate with a
prefix: "": https://github.com/great-expectations/great_expectations/issues/1398
0.10.11 also has related fixes.