← ...
project 5: the schema evolution handler
project 5: the “schema evolution” handler
scenario: the upstream dev team added a new column “discount_code” without telling you. your pipeline crashed. the mission: build a pipeline that handles “schema drift” automatically.
- tech: delta lake (on local disk) or apache iceberg.
- challenge: merging new schema into old schema without downtime.
- dev to prod:
- ingest project 1 data into a delta table.
- simulate a source change (add a column).
- write a script using
mergeschemaoptions to handle the update gracefully. - prod requirement: implement a “quarantine” table for rows that completely fail validation.