I want to understand , if i am performing Data Quality Checks on an JDBC data Sources like Oracle or Snowflake on a Data Set which is huge in size; say 50GB how does Great Expectation does the processing? Will is read the 50 GB of data within GE engine , perform Quality checks and create good or bad records on the filesystem ?
Does it offloads the DQ checks on the Source Database and does all processing within that database rather than reading all 50GB of data within GE engine? and later stores Good and Bad records within some other Database tables ensuring no data comes out off database and GE utilises the processing power of Source Database.
Can you point to any documentation which talks about the same for reference or explain me how the same works as i have a use case where my table on which i want to perform DQ checks is 400GB in size and i wanted to be considerate about the resource requirement when using GE.