Recently, we’ve been thinking a lot about data dictionaries at Superconductive Health. Our clients can have a broad range of technical expertise, and some clients ask us for quick summaries of their data in data dictionary form - e.g. a summary of the metadata about each column in a table. With that frequent request in mind, I really appreciated Alex Jia and Michael Kaminsky’s recent post, ”You probably don’t need a data dictionary.” They made some salient points about the maintenance burden becoming a scalability issue for data dictionaries with the logical conclusion that if a data dictionary needs to be manually maintained, it’s probably not worth the effort, as it will require maintaining multiple sources of truth for your data instead of relying on one source of truth. But what if a data dictionary were updated automatically based on your single data source of truth?
This is a companion discussion topic for the original entry at https://greatexpectations.io/blog/20191004_data_dictionary_plugin/