← ...
project 9: the idempotent backfill
project 9: the “idempotent” backfill
scenario: the pipeline broke for 3 days while you were on holiday. you need to re-run it without creating duplicates. the mission: retrofit your project 4 pipeline to be fully idempotent.
- tech: airflow, sql (
merge/upsert). - challenge: ensuring
run 1+run 1=run 1(not2x data). - dev to prod:
- modify your sql logic to delete existing data for the
execution_datebefore inserting. - prod requirement: test this by running the same dag run 3 times. the row count in the db must remain constant.
- modify your sql logic to delete existing data for the