Integrate GE with NiFi

Hello Team,

We are using NiFi for Data flow and want to integrate GE with NiFi to perform Validation within the NiFi pipeline before the data is ingested to Destination.

I did saw a reference in the docs that NiFi is supported but couldn’t find or understand how can i integrate GE with Nifi within the NiFi pipeline for Data Quality validations.

Kindly assist.

Thanks
Deepak

1 Like

We have not talked to teams who implemented Great Expectations on NiFi yet, so there is no published how-to guide for this integration.

Here are a couple of pointers that should help:
Since NiFi allows running arbitrary Python code, integrating a validation step into a NiFi pipeline should look very similar to doing this in Airflow.

We have an example repo that shows an Airflow pipeline with Great Expectations integrated here: https://github.com/superconductive/ge_tutorials/tree/main/ge_dbt_airflow_tutorial

See this section where we define a node in the Airflow DAG that validates the input data using Airflow’s PythonOperator: https://github.com/superconductive/ge_tutorials/blob/main/ge_dbt_airflow_tutorial/airflow/ge_tutorials_dag_with_great_expectations.py#L185

Please publish anything that will be helpful to the community. If you need further help with this, ping us in our Slack: https://greatexpectations.io/slack

If more users are interested in this integration, please “thumb-up” this GitHub issue: https://github.com/great-expectations/great_expectations/issues/1891

1 Like

Does that means I need to install python latest version along with great expectations on NiFi servers to perform the data validation checks in NiFi?

Can you share a sample python code which I can test in NiFi?

The validate_source_data method in the Airflow example (here is the link to this method: https://github.com/superconductive/ge_tutorials/blob/main/ge_dbt_airflow_tutorial/airflow/ge_tutorials_dag_with_great_expectations.py#L68) is what you can use in NiFi.