← ...
project 7: the data quality firewall
project 7: the “data quality” firewall
scenario: the ceo’s dashboard showed “nan” for total revenue. panic ensues. the mission: implement a “circuit breaker” that stops bad data before it hits the dashboard.
- tech: great expectations (gx) or soda core.
- challenge: defining strict rules for data quality.
- dev to prod:
- create a gx checkpoint:
expect_column_values_to_not_be_null("revenue"). - integrate this into your pyspark script from project 4.
- prod requirement: if validation fails, send a slack alert and fail the airflow task immediately.
- create a gx checkpoint: