← ...
project 3: the change data capture pipeline
project 3: the “change data capture” (cdc) pipeline
scenario: the analytics team needs access to user data, but querying the prod db is slowing down the app. the mission: replicate a postgres database to a data lake in real-time without querying it.
- tech: debezium (or write a custom python cdc poller), postgres, kafka connect.
- challenge: capture
insert,update, anddeleteoperations accurately. - dev to prod:
- set up a local postgres db with a “users” table.
- configure debezium (via docker) to listen to the postgres write-ahead log (wal).
- stream changes to a kafka topic
postgres.public.users. - prod requirement: ensure no data loss when the debezium container crashes.